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Abstract 

This article provides capacity expressions for multi-user and multi-cell wireless communication schemes when 
the transmitters and receivers are equipped with multiple antennas and when the transmission channel has a certain 
correlation profile. In mathematical terms, this contribution provides novel deterministic equivalents for the Stieltjes 
and Shannon transforms of a class of large dimensional random matrices. These results are of practical relevance to 
evaluate the rate performance of communication channels with multiple users, multiple cells and with transmit and 
receive correlation at all communication pairs. In particular, we analyse the per-antenna achievable rates for these 
communication systems which, for practical purposes, is a relevant measure of the trade-off between rate performance 
and operating cost of every antenna. We study specifically the per-antenna rate regions of (i) multi-antenna multiple 
access channels and broadcast channels, as well as the capacity of (ii) multi-antenna multi-cell communications 
with inter-cell interference. Theoretical expressions of the per-antenna mutual information are obtained for these 
models, which extend previous results on multi-user multi-antenna performance without channel correlation to the 
more realistic Kronecker channel model. From an information theoretic viewpoint, this article provides, for scenario 
(i), a deterministic approximation of the per-antenna rate achieved in every point of the MAC and BC rate regions, a 
deterministic approximation of the ergodic per-antenna capacity with optimal precoding matrices in the uplink MAC 
and an iterative water-filling algorithm to compute the optimal precoders, while, for scenario (ii), this contribution 
provides deterministic approximations for the mutual information of single-user decoders and the capacity of minimum 
mean square error (MMSE) decoders. An original feature of this work is that the deterministic equivalents are proven 
asymptotically exact, as the system dimensions increase, even for strong correlation at both communication sides. 
The above results are validated by Monte Carlo simulations. 



I. Introduction 

When mobile networks were expected to run out of power and frequency resources while being simultaneously 
subject to a demand for higher transmission rates, Foschini [1] introduced the idea of multiple input multiple 
output (MIMO) systems and Telatar [2] predicted a growth of the capacity performance by a factor min(iV, n), 
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compared to single-antenna schemes, for communications between an n-antenna transmitter and an AT-antenna 
receiver. This capacity gain stands when the propagation channel matrix model is formed of independent and 
identically distributed (i.i.d.) Gaussian entries. In practical systems though, this Unear multiplexing gain can only 
be achieved for large signal-to-interference plus noise ratios (SINR) and for uncorrected transmit and receive 
anterma arrays at both cormnunication sides. Today, in spite of this remark, the scarcity of available frequency 
resources has led to a widespread incentive for MIMO conmiunications. Mobile terminal designers now embed 
more and more antennas in small devices. Due to space hmitations mainly, this inevitably spawns non-negUgible 
channel correlation and, thus, non-neghgible effects on the achievable transmission rates. Since MIMO systems 
come along with a tremendous increase in signal processing requirements and, therefore, an even larger increase in 
power consumption, both infrastructure and mobile terminal manufacturers need to accurately assess the exact cost 
of increasing the achievable bit rates by adding more antennas on volume limited devices. The analysis of the exact 
throughput gain incurred by extra antennas is therefore paramount to evaluate the energy efficiency of multi-antenna 
devices. The first purpose of the present article is to evaluate the per-anterma achievable rate, which we further 
refer to as the antenna efficiency, for different cormnunication models involving multiple users or multiple cells. 
The antenna efficiency criterion comes in Une with the current incentive for energy-efficient communications, that 
are foreseen to predominate future telecommunication research interest. 

Multi-cell and multi-user systems are among the scenarios of main interest to cellular service providers. Although 
alternative communication models could be treated, the present article investigates the following two wireless 
communication systems: 

1) multiple access channels (MAC) in which K mobile terminal users transmit information to a unique receiver, 
hereafter referred to as the base station, and the dual broadcast channels (BC) in which the base station 
multi-casts information to the K terminal users. While the major scientific breakthroughs in multi-antenna 
broadcast channels are quite recent, e.g. [19], the practical applications are foreseen to arise in a near future, 
with e.g. the 3GPP long term evolution standard [3] 

2) single-user decoding and minimum mean square error (MMSE) decoding [7] in multi-cell scenarios. In most 
current mobile communication systems, the wireless networks are composed of multiple overlapping cells, 
controlled by non-cooperating base stations. Under these conditions, the achievable rates for every user in 
a cell, assuming no intra-cell interference, are the capacity of the single-user decoding scheme in which 
interfering signals are treated as Gaussian noise with a known variance. However, single-user decoders are 
difficult to implement and are often replaced in practical applications by suboptimal Unear decoders, such as 
linear MMSE decoders. These decoders are attractive as they are known to maximize the signal-to-interference 
plus noise ratio (SINR) experienced at the receiver. 

The achievable rate region of the multi-anterma MAC and BC have been known since the successive contributions 
[19]-[20], which estabUshed an important duality link between the MAC rate regions and the BC rate regions in 
both single antenna and MIMO channels, when the instantaneous channel reahzations are assumed to be perfectly 
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known at both communication sides. To achieve perfect channel state information at both communication ends, 
the channel must be somewhat static during a sufficient long period and is often referred to as a block-fading 
channel. Overall, communication channels can often be modelled as particular realizations of a stochastic process, 
in which case it is convenient to identify the parameters in the stochastic model that account for the cormnunication 
rate performance. The mathematical field of large dimensional random matrices is particularly suited to this end, 
as it can provide approximations of achievable rates as a function of the relevant channel parameters only, e.g. 
as a function of the long-term transmit and receive channel covariance matrices in the present situation, or as a 
function of the deterministic line of sight components in Rician models. The earUest notable result in line with 
the present study is due to Tulino et al. [4], who provide an expression of the asymptotic mutual information of 
point-to-point MIMO communications when the random channel matrix is composed of i.i.d. Gaussian entries. The 
authors also provide an expression of the ergodic capacity-achieving power allocation policy at the base station. 
In [38], Hochwald et al. derive a central limit result of the asymptotic capacity result obtained in [4], providing 
therefore an asymptotic expression of the outage capacity of large MIMO uncorrected channels. In [5], Peacock 
et al. extend the result from [4] in the direction of multi-user communications by considering the sum of K Gram 
matrices H/jH^, k & {1, . . . , K}, of channels with independent Gaussian entries and separable variance profile. 
The asymptotic eigenvalue distribution of this matrix model is derived (which is in fact a consequence of an earlier 
result from Girko [39]), but neither any explicit expression of the sum rate is provided as in [4], nor any ergodic 
capacity maximizing policy is derived. In [23], Soy sal et al. derive the sum rate maximizing power allocation 
policy for a finite number of antennas at all transmit and receive devices in the case of K users whose channels 
H/j, 1 <k < K, axe perfectly known at the transmitters and are modelled as Kronecker channels. We recall that 
Kronecker channels are made of a matrix with i.i.d. Gaussian entries multiphed both on the left and on the right 
by deterministic Hermitian matrices, hereafter referred to as the (left and right) correlation matrices. Those are 
more general than matrices of independent Gaussian entries with a separable variance profile, which can be seen 
as Gaussian i.i.d. matrices multiplied on the left and on the right by diagonal matrices. Contrary to [4], [23] does 
not provide a theoretical large dimensional analysis of the resulting capacity, and makes the strong assumption that 
all receive correlation matrices are equal. When the receive correlation matrices do not have the same eigenspace, 
determining the channel capacity, both in the finite and asymptotic regimes, is more complex and requires different 
mathematical tools. Those tools allow us in the present work to obtain a deterministic equivalent for every point in 
the per-antenna rate region of the MAC and BC. That is, for every deterministic precoding poUcy of the transmitters, 
we provide a deterministic approximation of the per-antenna achievable rate. This approximated value is more and 
more accurate as the system dimensions grow large. This is a consequence of our main result, stated in Theorem 
2. We mention that the final formula of Theorem 2 is already found and used by Chen et al.. Equation (32) in 
[10]. However, the latter is provided without proof, nor any rigorous hypotheses on the considered matrices, and 
stems in effect from a flawed usage of the previous Equation (6) in [10], which is only valid when aU receive 
correlation matrices have the same eigenspace. Chen also provides the iterative water-filUng algorithm which we 
shaU introduce in the course of this article to derive the boundary of the ergodic rate region of MAC (see Table 
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n). The convergence of this algorithm to the correct capacity, which we will partly prove in the current article, is 
not provided in [10]. 

Regarding multi-cell networks, to the authors' knowledge, few contributions treat simultaneously the problem of 
multi-cell interference in more structured channel models than i.i.d. Gaussian matrices. In [12], the authors carry out 
the performance analysis of TDMA-based networks with inter-cell interference. In [13], a random matrix approach 
is used to study large CDMA-based networks with inter-ceU interference, basing their work on the Wyner model 
introduced in [12]. In our particular MIMO context, it is important to mention the work from Moustakas et al. [9] 
who propose an analytic solution to the single-user decoding problem with channel correlation and a single source 
of interference, using the replica method [11]. In the present article, we will extend the results from [9] to more 
interfering sources. Moreover, since the rephca method is to this day not proven to be mathematically correct, we 
provide here a proof of the results in [9], under different hypotheses on the matrix model. 

In real channels, each transmitter and each receiver is affected by different correlation patterns. Assuming those 
patterns mutually independent, independent of the propagation environment and known to both communication ends, 
the Kronecker channel is proven to be the most natural channel model [37]. The multi-user Kronecker channel model 
is more general than all previously described channel models: when all receive correlation matrices are equal, we fall 
back on the model in [23], when all correlation matrices are diagonal, we fall back on [5] and, when all correlation 
matrices are identity, we fall back on [4]. Nonetheless, the Kronecker model is only vahd when no line-of-sight 
component is present in the channel, when a sufficiently large number of scatterers is found in the connmunication 
medium to justify the i.i.d. aspect of the inner Gaussian matrix, and when the channel is frequency flat on the 
transmission bandwidth. In their substantial contributions [25]-[27], Hachem et al. have extensively studied the 
point-to-point multi-antenna Rician channel model for which they provide a deterministic equivalent of the ergodic 
capacity [25], the corresponding ergodic capacity-achieving input covariance matrix [26] and a central limit theorem 
for the ergodic capacity [27]. We recall that Ricean channels are modeUed as the sum of a deterministic Une-of- 
sight matrix and a random matrix of independent entries with a variance profile. In [40], Moustakas et al. provide 
an expression of the mutual information in time varying frequency selective Rayleigh channels, using the rephca 
method. This result has been recently proven by Dupuy et al. in a yet unpubhshed work. The same authors then 
derived the expression of the capacity maximizing precoding matrix for the frequency selective channel [41]. Part of 
the present study is inspired by the ideas in [41]. A more general frequency selective Rayleigh channel model with 
non-separable variance profile is studied in [28] by Rashidi et al. using alternative tools from free probabihty theory. 
The requirements from free probability theory on the studied matrices are more stringent, though, since Gaussian 
distribution must be assumed for the entries of the random matrices, while deterministic matrices in the model must 
have an eigenvalue distribution that converges weakly to a compactly supported distribution. Of practical interest is 
also the theoretical work of Tse [8] on MIMO point-to-point capacity in both uncorrected and correlated channels, 
which are vahdated by ray-tracing simulations. 

The main contribution of this paper are two theorems, contributing to the field of random matrix theory and 
enabling the evaluation of the per-antenna rate achieved at every point in the MAC and BC rate regions, as well 
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as an iterative water-filling algorithm enabling the description of the boundaries of the ergodic rate region of the 
MAC channel, when all channels are modelled according to the Kronecker model. 

The remainder of this paper is structured as follows: in Section II, we provide a short sunmiary of our results and 
how they apply to multi-user and multi-cellular wireless communications. In Section III, our main two theorems 
are introduced. The complete proofs of both theorems are left to the appendices. In Section IV, the rate region 
of MAC and BC and the capacity of single-user decoding and MMSE decoding with inter-cell interference are 
studied. In this section, we also introduce our third main result: an iterative water-filUng algorithm to describe the 
boundary of the ergodic rate region of the MAC. In Section V, we provide simulation results of the previously 
derived theoretical formulas. Finally, in Section VI, we give our conclusions. 

Notation: In the following, boldface lower-case characters represent vectors, capital boldface characters denote 
matrices (Ijv is the A'' x A'' identity matrix). Xij denotes the entry of X. The Hermitian transpose is denoted 
(•)^. The operators tr X, |X| and ||X|| represent the trace, determinant and spectral norm of matrix X, respectively. 
The symbol E[-] denotes expectation. The notation stands for the empirical distribution of the eigenvalues of 
the Hermitian matrix Y. The function (a;)+ equals max(a;, 0) for real x. For F, G two distribution functions, we 
denote F G the vague convergence of F to G. The notation a;„ x denotes the almost sure convergence of 
the sequence Xn to x. 

II. Scope and Summary of Main Results 

In this section, we sunnmarize the main results of this article and explain how they naturally help to study, in 
the present multi-cell multi-user framework, the effects of channel correlation on the antenna efficiency, which we 
define as the achievable rate per transmit antenna. 

A. General Model 

Consider a set of K wireless terminals, equipped with m, . . . , uk antennas respectively, which we refer to as 
the transmitters, and another wireless device equipped with N antennas, which we call the receiver. We presently 
consider the communication from the terminals to the base station, although in the remainder of this article we 
shall consider both uplink and downlink transmissions. Denote Hfc e C^^"*" the channel matrix model between 
transmitter k and the receiver. Let H/t be defined as 

Hfc=R|XfcT| (1) 

where R| G C^^^ and T| S C"'=^"'= are the nonnegative Hermitian square roots of the Hermitian nonnegative 
matrices and T^, respectively, and X^ G C^^"*" is a realization of a random matrix with Gaussian i.i.d. entries. 
The matrices and R^ in this scenario model the correlation present in the channel at transmitter k and at the 
receiver, respectively. It is important to stress that those correlation patterns emerge both from the inter-antenna 
spacings on the volume limited devices and from the soUd angles of useful transmitted and received signal energy; 
that is, even though the transmit antennas emit signals in an isotropic manner, only a limited soUd angle of emission 
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is effectively received, and the same holds for the receiver which captures signal energy from a limited soUd angle. 
Without this second factor, it would make sense that all R/s matrices are equal at the receiver. This would mean that 
signals are received isotropically at the receiver, which is often too strong an assumption to characterize practical 
communication channels. This being said, one can assume physically identical and interchangeable antennas on 
each device. We therefore assume that the diagonal entries of R/j and T^, i.e. the variance of the channel fading 
on every antenna, are identical and, up to a scaUng factor, equal to one. As a consequence, tvTLk = N and 
tr Tk = ni~. We will see that under these trace constraints, the hypotheses made in Theorem 1, used to characterize 
the capacity of MMSE precoders, are always satisfied, therefore making Theorem 1 vaUd for all possible figures 
of correlation, including strongly correlated patterns. The hypotheses of Theorem 2, used to characterize the rate 
region of MAC and BC, require additional mild assumptions, making Theorem 2 valid for all but some unreaUstic 
correlation matrices R/j and T/j. These statements are of major importance and rather new since, in alternative 
contributions, e.g. [25], [26], it is usually assumed that the correlation matrices have uniformly bounded spectral 
norms (for all N). This physically means that only low correlation patterns are allowed; short distances between 
antennas ans small solid angles of energy propagation are therefore excluded. In the present work, this restriction 
is not needed. The counterpart of this interesting property is a reduction of the convergence rates of the derived 
deterministic equivalents, compared to those proposed in [25] and [26]. 

As will be evidenced in Sections IV-A and IV-B, most multi-cell or multi-user capacity performance rely on the 
so-called Stieltjes transform and Shannon transform of matrices B/y^ of the type 

K ^ ^ 

Bjv = ^RfeXfeTfeX^R^ (2) 

k=l 

We study these matrices, using tools from the field of large dimensional random matrix theory [34]. The Stieltjes 
transform mN{z) of the Hermitian nonnegative definite matrix Bjv € C^^^ is defined, for 2; e C \ M"*" as 

mN{z) = J j^JF^'^iX) (3) 

= -^tr(Bjv-^Ijv)"' (4) 

where F^^ denotes the distribution function of the eigenvalues of Bjv. The Stieltjes transform was originally 
used to characterize the asymptotic distribution of the eigenvalues of large dimensional random matrices. From a 
wireless communications point of view, it can be directly used to characterize the signal-to-interference plus noise 
ratio (SINR) of certain communication models. In the present work, the Stieltjes transform of B/^ matrices defined 
in (2) will be used to approximate the SINR of MMSE decoders in single-user conmiunications with inter-cell 
interference. 

Then, there exists a Unk from the Stieltjes transform to the so-called Shannon transform Vn{x) of B/^, defined 



7 



for a; > as 

N 



Vn{x) = ^ logdet [In + -Bn ] (5) 



X 



£ log(^l + ^)dFB^(A) (6) 



-L 



mjv(— w) I dw. (7) 

J 

The Shannon transform, named after Claude Shannon, is cormnonly used to provide approximations of capacity 
expressions in large dimensional systems. In the present work, the Shannon transform of B jv matrices will be used 
to provide a deterministic approximation of the achievable per-antenna rate for different communication models. 

Before introducing our main results, namely Theorem 1 and Theorem 2, which are rather technical and difficult 
to fathom without a prehminary explanation, we succinctly describe these results in telecommunication terms and 
their consequences to the multi-user multi-cell communication models at hand. 

B. Main results 

The main results of this work come as follows. 

• We first introduce Theorem 1, which provides a deterministic equivalent m°j^{z) for the Stieltjes transform 
rriiv {z) of Bjv, under the assumption that N and grow large but at the same rate and the distribution functions 
{F'^'=}„j. and {-F^'=}jv form tight sequences [35]. This is, we provide an approximation m°j^{z) of mN{z) 
which does not depend on the realization of the X/; matrices and which is almost surely asymptotically exact 
when N ^ oo. The tightness hypothesis is the key assumption that allows degenerated R/j and matrices 
to be valid in our framework, and that therefore allows us to study strongly correlated channel models. 

• We then provide in Theorem 2 a deterministic equivalent V%{x) for the Shannon transform Vn{x) of B;v For 
this theorem, the assumptions on the Rfe and matrices are only slightly more constraining and of marginal 
importance for practical purposes. Our results theoretically allow the largest eigenvalues of T/j or Rfe to grow 
linearly with N, as the number of antennas increases, as long as the number of these large eigenvalues is of 
order o{N) (Theorem 1 does not require this condition). 

The major practical interest of Theorems 1 and 2 lies in the possibihty to analyze mutual information expressions 
for multi-user multi-antenna channels, no longer as stochastic variables depending on the matrices X/- but as 
approximated deterministic quantities. The study of those quantities is in general simpler than the study of the 
stochastic expressions, even though the deterministic results are not closed-form expressions but solutions of imphcit 
equations (see Section III). In particular, remember that we study here the trade-off 'throughput gain versus cost' 
of adding more antennas to the transmit or receive communication ends. For this reason, the typical figures of 
performance sought for are antenna efficiencies, i.e. the per-transmit antenna normahzed capacity, sum rate or rate 
region. Those performance figures are related to the Stieltjes and Shannon transform of B;v-like matrices. We do 
not provide in this study total rate expressions, i.e. N times the Shannon transform, for which asymptotic accuracy 
of the deterministic equivalents cannot be verified. Using different techniques and under more constraining channel 
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conditions, alternative works have shown tiiough that deterministic equivalents of the Shannon transform converge 
as 0{1/N'^), see e.g. [26] for the case of Rician channels. 

In practical apphcations, such as the determination of the rate region of MAC and BC, Shannon transform 
expressions (5) are needed to evaluate the achievable rate for all points in the rate region, i.e. for all deterministic 
precoders. When all users have perfect channel state information, there exists a duahty between MAC and BC rate 
regions. To determine the BC rate region, one therefore simply needs to treat the dual MAC uphnk problem; see 
Section IV-A. In order to account for the effect of the transmit precoders in the channel model, the correlation 
matrices at the MAC transmitters will be replaced by the product of the channel correlation matrix and the precoding 
matrix. Nonetheless, the determination of an explicit form for the optimal precoding matrices which maximize (5) 
for block-fading channels is a very difficult problem both for finite N and in the asymptotic regime. To the authors' 
current knowledge, this has not been solved. In the ergodic sense, when the transmitters only have statistical state 
information about the time varying channel, it is usually possible to determine the precoders that reach the boundaries 
of the ergodic rate region. However, for partial channel state information in the system, MAC-BC duality no longer 
holds, so that the boundary of the ergodic rate region of BC cannot be determined. 

In this article, we will provide (i) a deterministic equivalent for every point in the MAC and BC rate regions for all 
deterministic precoders in block-fading channels and (ii) the precoding matrices which maximize the deterministic 
equivalent of the ergodic Shannon transform EVjv of (5) in fast varying MAC channels when statistical channel 
state information is available at the transmitters. The reason why Theorem 1 and Theorem 2 are not able to determine 
the precoders that correspond to the MAC rate region boundary in the block-fading case is explained hereafter. When 
deterministic precoders are used, every point in the rate regions can be estimated by a deterministic equivalent, even 
on the boundary, for all finite system dimensions. This deterministic equivalent is (almost surely) asymptotically 
accurate if, as the system dimensions grow (in some predefined manner), the sequences of precoding matrices 
of growing dimensions satisfy some mild assumptions. One of these assumptions is that the precoding matrices, 
for growing dimensions, are chosen independentiy of the Xfe matrices. The precoders that reach the rate region 
boundary of MAC block-fading channels are however expUcitiy built upon the matrices. Such precoders do not 
satisfy the assumption of independence and our results therefore do not hold, i.e. the deterministic equivalents exist 
but are not asymptotically accurate in this case. In the ergodic sense though, optimal precoders are independent of 
the reaUzations of the random matrices and can therefore be considered deterministic. Our results will therefore 
hold in this scenario. The precoders that maximize the deterministic equivalent of the ergodic sum rate, and which 
we characterize in this article, have the following interesting properties, 

• their eigenspaces coincide respectively with the eigenspaces of the correlation matrices at the transmitters, 

• their eigenvalues are solution of a classical optimization problem, 

• we provide an iterative water-filling algorithm to detemune these eigenvalues, which, upon convergence, is 
proved to converge to the correct solution. 
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Note that those precoders are not claimed to be the true sum-rate maximizing precoders, but only the matrices 
which maximize the deterministic equivalent of the ergodic sum rate. From this fact, it is easy to show that the 
difference between the rates achieved by the deterministic equivalent using those precoders and the true ergodic 
rates achieved using optimal precoders asymptotically goes to zero almost surely. 

III. Mathematical Preliminaries 

In this section, we first introduce Theorem 1, which provides a deterministic equivalent for the Stieltjes transform 
of matrices B n defined in (2). The Shannon transform of Bjv is then provided in Theorem 2, under shghtly tighter 
assumptions on the matrices Rfe and Tfc. 

Theorem 1: Let K be some fixed positive integer. For some N & W, let 

K 

Bjv = ^ R^XfeTfeX^Rl + S (8) 
fe=i 

he &n N X N matrix with the following hypothesis for all fc e {1, . . . , K}, 

1) Xfe = e C^^^"-" with X^^ e C i.i.d. for all N, k, i, j, and E|Xfi - EXfip = 1, 

2) R| e C^^^ is the Hermitian nonnegative definite square root of the nonnegative definite Hermitian matrix 

Rfe, 

3) Tfe = diag(Ti, • • . , r„j,) with Tj > for all i, 

4) the sequences {F'^*'}„^>i and {-F^'=}jv>i are tight, i.e. for all £ > 0, there exists M > such that 
FTfc(M) > 1 - e and F^^iM) > 1 - e for all n^, N, 

5) S e C^^^ is Hermitian nonnegative definite, 

6) there exist 6 > a > for which 

a < liminf Cfe < limsupcjv < b (9) 

JV jv 

with Cfe = N/uk- 

Also denote, for z e C \ M+, mjv(2) = / ^dF^"(A), the Stieltjes transform of Bat. Then, as all rzfe and N 
grow large, with ratio Cfe, 



where 



mN{z)-m%{z) ^0 (10) 



^N{z) = ^r^'\^ + Yl [ ^Rfe-zlAr) (11) 

V k^i'' l + CfcTfeefe(2) J 

and the set of functions {ei{z)}, i G {1,. . . , K}, forms the unique solution to the K equations 

e,(z) = ltrRjs + V / ^^^^^^ (^^) _ ^ (12) 



such that sgn($j[ei(2:)]) = sgn(S>[^;]) when Q[z] ^0 and such that ei{z) > when z is real and negative. 
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Moreover, for any e > 0, the convergence of Equation (10) is uniform over any region of C bounded by a contour 
interior to 

C\{{z:\z\<e}[J{z = x + iv:x>0, \v\ < e}) . (13) 

For all N, the function m^^r is the Stieltjes transform of a distribution function F^. Denoting F^'^' the empirical 
eigenvalue distribution function of B at, we finally have 

weakly and almost surely as A'' — )• oo. 

Proof: The proof of Theorem 1 is deferred to Appendix A. ■ 

Remark 1: In her PhD dissertation [42], Zhang derives an expression of the limiting eigenvalue distribution 
for the simpler case where K = 1 and S = but Ti is not constrained to be diagonal. Her work also uses a 
method based on the Stieltjes transform. Based on [42], it seems to the authors that Theorem 1 could well be 
extended to non-diagonal Tfe. However, proving so requires involved calculus, which we did not perform here. 
Similar conclusions can be drawn from the work of Rashidi et al. [28], based on operator- valued free probabiUstic 
tools, which is a simpler method but which requires that the eigenvalue distributions of Tfe, Rk and Xfe have 
finite support. The latter is too strong an assumption for our present application purposes. Also, in [6], using the 
same techniques as in the proof provided in Appendix A, Silverstein et al. do not assume that the matrices are 
normegative definite. Our result could be extended to this less stringent requirement on the central matrices, 
although in this case Theorem 1 does not hold for z real negative. For application purposes, it is fundamental here 
that the Stieltjes transform of Bjv exist for z G R~, for which it is sufficient that Tk > for all k. 

We now claim that, under proper initialization, for z G C\IR+, a classical fixed-point algorithm converges surely 
to the solution of (12). This result is largely inspired by the original work of Dupuy et al. [41], used in the context 
of frequency selective channel models, and unfolds as follows 

Proposition 1: For ^ e C \ R+, the fixed-point algorithm described in Table I converges surely to the unique 
solution {ei(^), . . . , eK{z)} of (12), such that sgn(Sj[ej(2;)]) = sgn(Sj[2]) when Q[z] ^ and such that ei{z) > 
when z <0, for all i. 

Proof: The proof of Proposition 1 is provided in Appendix E. ■ 
We shall see in Section IV-A that every point in the rate regions of MAC and BC can be described in terms of 
the solutions of (12), for z = — cr^ < 0, where cr^ is the additive Gaussian noise variance in the channel model. As 
such. Proposition 1 is of interest to the practical evaluation of all points in the rate regions. Note that alternative 
techniques are often used that produce faster convergence than the fixed-point algorithm described in Table I, such 
as the algorithm known as Newton's method. 

Looser hypotheses will be used in the applications of Theorem 1 provided in Section IV. We will specifically 
need the corollary hereafter. 
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Define e > 0, the convergence threshold and n > 0, the iteration step. For all 
fc e {1, . . . , K}, set = 1/z and = oo. 
while maxj{\ej - e^"^|} > £ do 
for ke{l,...,K}do 
Compute 

end for 

assign n <— n + 1 

end while 

TABLE I 

Fixed-point ALGORITHM converging to the solution of (12) 



Corollary 1: Let K be some positive integer. For some N gN*, let 

K 

B;v = 5]RjXfeTfcX^R| (16) 

fe=i 

he an N X N matrix with the following hypothesis for all e {1, . . . , K}, 

1) Xfc — (^-^X^-^ e C^^"*", where the X^ - are i.i.d. Gaussian with zero mean and unit variance, for each i, 
j, N, k. 

2) R| S ([^NxN jjjg Hermitian nonnegative definite square root of the nonnegative definite Hermitian matrix 

Rfe, 

3) Tfc G is a nonnegative definite Hermitian matrix, 

4) {F'^'^}n^>i and {F^'=}Ar>i form tight sequences, 

5) there exist 6 > a > for which 

a < liminf Ck < limsupcAr < b (17) 

N ;v 

with Ck = N/nk- 

Also denote, for a; > 0, mN{—x) = tr(Bjv + x1n)~^- Then, as all A/' and nk grow large (while K is fixed) 
with ratio Cfe 

mN{—x) — m°]^{—x) (18) 

where 

/ X -1 

TkdF^>'{Tk) 

1 + Cfcrfeefe(-a;)' 

and the set of functions {ei(— x)}, i e {1, . . . , -ft'}, form the unique solution to the K equations 

-1 



= ^ tr ^ / '^""^ '". Rk + xl^ (19) 
N l + CkTkeki-x) I 
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such that ei{—x) > for all i. 

Proof: Since the are Gaussian, the joint distribution of the entries of X/;U coincides with that of X^, for 
U any n/j x unitary matrix. Therefore, X^TfeX]^ in Theorem 1 can be substituted by X/t;(UTfcU^)X^ without 
compromising the final result. As a consequence, the can be taken non diagonal nonnegative definite Hermitian 
and the result of Theorem 1 holds. ■ 

The deterministic equivalent of the Stieltjes transform rriN of Bjv is then extended to a deterministic equivalent 
of the Shannon transform of B jv in the following result. 

Theorem 2: Let a; > and Bjv be a random Hermitian matrix as defined in Corollary 1 with the following 
additional assumptions 

1) there exists a> and a sequence rjv, such that, for all N, 

max m.ayi(XF^.,,X^'',,)<a (21) 

l<k<K ^ -TN + f rjv + i^ - 

where > ■ ■ ■> denote the ordered eigenvalues of the N x N matrix X. 

2) denoting 6jv an upper-bound on the spectral norm of the T/j and Rfc, k G {1, • • • ,K}, and (3 some real 
constant such that /3 > K{b/a){l + \/a)^, ajv = bj^P satisfies 

rjv log(l + aN/x) = o{N). (22) 

Then, for large A^, n^, the Shannon transform Vjv(a;) = /log(l + ^X)dF^'^ {X) of Bjv, satisfies 

Vn{x) - V°^{x) ^ (23) 

where 

K ^ 

+ x-m%{-x)-l. (24) 

Proof: The proof of Theorem 2 is provided in Appendix B. ■ 
Note that this last result is consistent both with [4] when the transmission channels are i.i.d. Gaussian and with 
[9] when K = 2. This result is also similar in nature to the expressions obtained in [25] for the multi-antenna Rician 
channel model and with [40] in the case of frequency selective channels. We point out that the expressions obtained 
in [40], [41] and [26], when the entries of the matrices are Gaussian distributed, suggest a faster convergence 
rate of the deterministic equivalent of the Stieltjes and Shannon transforms than the one obtained here. Indeed, 
while we show here a convergence of order o(l) (which is in fact refined to o(log'' N) for any k in Appendix A), 
in these works, the convergence is proven to be of order 0(l/iV^). 

Contrary to these contributions though, we allow the Rfc and matrices to be more general than uniformly 
bounded in spectral norm. First, Theorem 1 and Corollary 1 require {F^'' } and {F'^'' } to form tight sequences. 
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Remark now that, because of the trace constraint ^ tr Rfe = 1, all sequences {i^^*"} are necessarily tight. Indeed, 
given £ > 0, take M = 2/e; N[l — F^'=(M)] is the number of eigenvalues in larger than 2/e, which 
is necessarily less than or equal to Ne/2 from the trace constraint, leading to 1 — F^''{M) < e/2 and then 
F^''{M) > 1 — e/2 > 1 — £. The same naturally holds for the Tfc matrices. Observe now that Condition 2 in 
Theorem 2 requires a stronger assumption on the correlation matrices. Under the trace constraint, this requires that 
there exists a > 0, such that the number of eigenvalues in greater than a is of order o{N/ log N). This may not 
always be the case, as we presently show with a counter-example. Consider the sequences of matrices Rfe e C^^^, 
N a power of 2, whose eigenvalue distribution is a mass in of density 1 — 1/ log2 N and a mass in log2 N of 
density 1 / log2 N. Clearly this distribution satisfies the trace constraint and is unbounded, so that for all a > 0, one 
can take A^o a power of 2 such that log2 A^o > Oi\ for N > Nq, tn = N/ log2 N and then rjv log(l + aN/x)/N is 
away from for all N large. This proves that the trace constraint is not enough to satisfies Condition 2 of Theorem 
2. However, physically meaningful correlation matrices do not present this type of exceptional behaviour. Instead, 
low correlation tends to balance all eigenvalues around 1, in which case correlation matrices are uniformly bounded, 
while high correlation tends to bring a very few eigenvalues (much less than N/ log2 N) to be large, the others 
being very small, in which case Condition 2 is satisfied. From now on, we claim that the conditions of Theorem 1 
and Theorem 2 are satisfied for aU physically meaningful correlation matrices. 

IV. Applications 

In this section, we provide two apphcations of Theorems 1 and 2 to the field of wireless communications. First, 
in Section IV-A, we derive an approximation of every point in the rate region of block-fading correlated multi- 
anterma MAC and BC, which is (almost surely) asymptotically accurate for all sequences of deterministic precoders, 
and an approximation of the boundary of the ergodic rate region of multiple access channels, which is (surely) 
asymptotically accurate. We then introduce an iterative power allocation algorithm to maximize the deterministic 
equivalent of the ergodic MAC rate region. Then, in Section IV-B, we provide an approximation of the capacity 
of the single user decoding and MMSE decoding in wireless MIMO networks with inter-cell interference. The 
latter are almost surely asymptotically accurate in the block-fading sense and surely asymptotically accurate in the 
ergodic sense. 

Since in this section we study both uplink transmissions from mobile terminals to base stations and downlink 
transmissions from base stations to terminals, for notational consistency, Tfe matrices will be used to model channel 
correlations at the base stations (be they transmitters or receivers) and Rfe matrices will be used to model channel 
correlations at the mobile terminals (be they transmitters or receivers). 

A. Rate Region of Multiple Access and Broadcast Channels 

1) System Model: Consider a wireless multi-user channel with K > 1 users indexed from 1 to K, controlled 
by a single base station. User k is equipped with rife antermas while the base station is equipped with N antermas. 
We additionally denote Cfe = N/uk- This situation is depicted in Figure 1. 



14 



Base station 




Fig. 1. Downlink scenario in multi-user broadcast channel 

Denote s e C^, E[ss'^] = P, the signal transmitted by the base station, with power constraint -^trP < P, 
-P > 0, y/c G C"'' the signal received by user k and Uk ^ C3Nf(0, cr^I„^, ) the noise vector received by user fc.' The 
fading MIMO narrowband channel between the base station and user k is denoted Hfc e C"''^^. Moreover, we 
assume that follows the Kronecker model, 

Hfe=RjXfeT| (25) 

with R/j e c^fcxifc the (Hermitian) correlation matrix at terminal k with respect to the channel H/j, T/j e ^nxn 
the correlation matrix at the base station with respect to Hfc and Xfc e ([^rikxN ^ random matrix with Gaussian 
independent entries of variance 1/rij.. In this model, Tj. and Rj. satisfy trTj. = N and trR^ = n^- We additionally 
constrain the eigenvalues of the matrices Rj. and T^, k £ {1, . . . , K}, to satisfy the mild Condition 2 of Theorem 
2. From our previous remark, we mostly allow all but physically meaningless models of covariance matrices. 
Under the above assumptions, the downlink communication model reads 

yfe = HfcS + Hfe. (26) 

Denoting equivalently the signal transmitted in the dual upUnk (MAC) by user fc, such that E[sfeS^] = P^, 
^ tr Pfc < Pk, y and n the signal and the noise received by the base station, respectively, we have the dual uplink 
model 

K 

y = ^H^Sfe + n. (27) 

k=l 

In the following, we will successively study the MAC and BC rate regions for block-fading channels by means 
of the MAC-BC duality [19]. We shall then consider the ergodic rate region for time varying MAC. 

'Up to a scaling of the power constraints of the individual users, assuming identical noise variance cr^ on each receive antenna for every 
user does not restrict the generality and simplifies the theoretical expressions. 
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2 ) MAC and BC Rate Regions in Block Fading Channels: We start by assuming that the channels Hi , . . . , are 
constant over the observation period. The per-receive antenna rate region Cmac(-Pi5 • • • , Pk', H^), under respective 
transmit power constraints Pi, ... , Pk for users 1 io K and channel = [Hi . . . H'^], reads [21] 

Cmac(-Pi, • ■ • , -Pr-; H'^) 



,V§c{l,...,i^} 



(28) 



Pi>0 
i=l,...,K 



where Pj > stands for "Pj is nonnegative definite". 

Since the entries of Xfc have variance 1/nk, the power constraints on Pi, ... , P^ necessarily scale with nu- We 
might have alternatively assumed, as is often the case, that the have entries of unit variance and therefore that 
the power constraints are independent of the channel dimensions. Before applying Corollary 1 and Theorem 2, we 
need to verify that {F^^^''} for growing is necessarily tight and that Condition 2 of Theorem 2 is satisfied. 
From the argument given in Section III, both and {F^''} are tight sequences. For £ > such that UkS e N, 

we can therefore choose M such that 1 - F^'=(\/M) < e/2 and 1 - F^''{-/M) < e/2 for all rife; from Lemma 
15, since the smallest nke/2 + 1 eigenvalues of both Rfe and Pfe are less than VM, at least the smallest UkS + 1 
eigenvalues of RfePfe are less than M, hence 1 — F^'=^''(M) < s and {F^''^''} is tight. Once again. Condition 2 
of Theorem 2 can be satisfied for all but meaningless RfePfe matrices, and we claim the latter of no relevance to 
the current investigation. 

We can now apply Theorem 2, which presently states that, for any set S C {1, . . . , K}, we have for N, Uk large, 
almost surely 



N 



log 



Ijv + 



PiH, 



= — logdet I Iat 



rk 



l + Cfeefe(-(72)rfe 



+ ^ - / log (1 + Cfeefe(-a2)rfe) dF^"^" (rfe) 
fees ^'^ J 

+ C72.m|(-a2)- 1 + 0(1), 



where 




fees- 



rkdF^^^ojrk) , 
+ Ckrkek{-(T'^) 



N 



and ei(— (T^), . . . , ex (— o"^) are the unique positive solutions to 



e.(-a^) = ^trTJ^ 
Vfees 



(29) 



(30) 



(31) 



1 + Cferfeefe(-£72) 

From these equations, every point of the MAC rate region (28) can be deterministically approximated. Note that 
the convergence rates of (10) and (23) are dictated to some extent by the F^'' and so that the term o(l) in 
(29) cannot necessarily be bounded for fixed N. 

Now, we can similarly provide a deterministic equivalent to every point in the rate region of the block-fading 
broadcast channel. This rate region, name it Cbc(P;H), has been recently shown [22] to be achieved by dirty 
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paper coding (DPC). For a transmit power constraint P over the compound channel H, it is shown by MAC-BC 
duahty that [19] 

eBc(P;H)= U eMAc(Pi,...,-PK;H") (32) 

Pi,...,Pk 

Ef=i Pk<P 

which is easily obtained from Equation (28). 

To achieve the boundaries of the MAC and BC rate regions for the block-fading channel, the precoding matrices 
Pi , . . . , Pk need be tailored to the channels Hi, ... , . To this day, no closed-form expression of these optimal 
precoders is available, although an iterative water-filUng algorithm has been derived by Yu et al. in [43] to determine 
these precoding matrices. Theorem 2 cannot be used either, since the optimal precoders will be strongly dependent 
on the specific reaUzation of the H^, and therefore dependent on the X^. If the are not known to the transmitters 
though, the optimal precoding matrices obviously no longer depend on the X^, but certainly on the correlation 
matrices Rfe and T^. 

When the channel is varying too fast to allow reliable channel estimation, transmitters in a multiple access channel 
typically do not know the exact Hfe matrices. On the contrary, the transmit and receive correlation matrices are in 
this case long-term channel variations that the transmitters can usually reliably estimate. We study this scenario in 
the next section. 

3) MAC Rate Region in Fast Fading Channels: Suppose that the channels are varying fast and that the 
transmitters in the MAC only have statistical channel state information, i.e. they only know their respective Tfc 
and R/j matrices. In this case, the MAC rate region will be referred to as the ergodic rate region. The ergodic rate 
region C^ac ^ ™^ ^^^^ given by 



p(crgodic)/p 
'-MAC l-f^l)-- 




{Ri,l <i<K}:^Ri<¥. 



log 



,V§c{l,...,if} 



(33) 



where the expectation is taken over the joint random variable (Xi, . . . , X;^ ). 

Now, Theorem 2 states that V m (.t) — {x) ^ 0, as grows large, on a subset of measure 1 of the probability 
space n that engenders (Xi, . . . , X^)- Integrating this expression over 57 therefore leads to EVjv(.t) — V°^{x) — )■ 0. 
We can therefore apply Theorem 2 to determine the ergodic rate region C^^q " ^ of the fast fading MAC. For fixed 
Pi, . . . , Pif, we therefore have here 

E 



I + ^ E PiH. 1 = ^ log det (liv + A E / 2, dF^'"' i^k) 



+ Ckek{-<j'^)rk 

+ ^ 1 / log (1 + cueu{-a^)rk) dF^"^- (r^) 
fees '^^ J 

+ ■ml{-a^)-l + o{l), 



(34) 



with m|(— cr^) given by (30), and where the convergence with growing N is sure. 
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Define > the convergence threshold and i > the iteration step. At step I = 0, 



for all fc G S, set pl = P^.. 



while maxfe{|pj. -p^"^|} > j? do 

For fees, define e^"^^ as the solution of (12) for z = —cr'^, obtained from the 
fixed-point algorithm of Table I. 
for fe e S do 



for i = 1 . . . , nfc do 



Set p'+l = ( Mfe iTi ) • with Hk such that ^ trPfc = Pfc. 




end for 



end for 



assign i <— Z + 1 



end while 



TABLE n 



Iterative water-filling algorithm 



The transmission policy that achieves the boundary of the ergodic rate region requires here to determine the rate 
optimal precoding matrices Pi,...,P|s|, for all S C {1,..., Jf}, which are dependent only on the T/j and Rfe 
correlation matrices. To this end, we first need the following result. 

Proposition 2: If at least one of the correlation matrices Rfc, fc e § is invertible, then the right-hand side of (29) 
is a strictly concave function of Pi, . . . , P|s|. 

Proof: The proof of Proposition 2 is provided in Appendix C. ■ 

From Proposition 2, we immediately prove that the |§|-ary set of matrices (P*, . . . , P|g|), which maximize the 
deterministic equivalent of the ergodic sum rate over the set §, is unique, provided that one of the Rfc is invertible. 
In a very similar way as in [26], we then show that the matrices P^, fc e {1, . . . , |§|}, have the following properties: 
(i) their eigenspace of P^ is the same as that of Rfe, (ii) the eigenvalues of P^ are the solutions of a classical 
water-filhng problem. 

Proposition 3: For every k G S, denote Rfe = UfeDfeU^ the spectral decomposition of Rfe with Ufe unitary and 
Dfe = diag(rfei, . . . ,rknk) diagonal. Then the covariance matrices P*, . . . , P*g| which maximize the right-hand 
side of (34) satisfy 

1) Pj^ = UfeQJU|j, with diagonal, i.e. the eigenspace of P^ is the same as the eigenspace of Rfe, 

2) denoting = efe(— cr^) when Pfe = P^, for all k, the i*'* diagonal entry p^- of satisfies 



We then propose an iterative water-filling algorithm to obtain the power allocation policy which maximizes the 
right-hand side of (29). This is provided in Table II. 




(35) 



where the /Xfe are evaluated such that — trPfe = Pfe. 
Proof: The proof of Proposition 3 is provided in Appendix D. 
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Fig. 2. Downlink multi-cell scenario 

In [26], it is proven that the convergence of this algorithm impUes its convergence towards the correct Umit. 
The Hne of reasoning in [26] can be directly adapted to the current situation so that, if the iterative water- 
filling algorithm of Table II converges, then Pi, . . . , Pa' converge to the capacity-achieving precoding matrices 
P{, . . . , P\-. However, similar to [26], it is difficult to prove the sure convergence of the water-filUng algorithm. 
Nonetheless, extensive simulations suggested that convergence always happens. 

B. Multi-User MIMO 

1 ) Signal Model: In this section we study the per-antenna rate performance of wireless networks composed of a 
multi-antenna transmitter and a multi-antenna receiver interfered by several multi-antenna transmitters in adjacent 
cells. This scheme is well-suited to multi-cell wireless networks with orthogonal intra-cell and interfering inter-cell 
transmissions, both in the downlink and in the uplink. In particular, this encompasses the following scenarios 

• multi-cell uplink: consider a network of K cells. On a given time or frequency resource, the base station of 
the cell indexed by i e {1, . . . , K} receives data from a unique terminal user of this cell and is interfered by 
K — 1 users transmitting on the same physical resource from remote cells indexed by j £ {!,..., K}, j ^ i. 

• multi-cell downlink: the user being allocated a given physical resource in a cell indexed by i e {1, • • • ,K} 
receives data from its dedicated base station and is interfered hy K ~ 1 base stations in neighboring cells 
indexed by j e {!,..., K}, j ^ i. This situation is depicted in Figure 2. 

In the following, the downlink scheme is considered. Consider a wireless mobile network with K > 1 cells 
indexed from 1 to K, controlled by non-cooperative base stations. We assume that, on a particular time or frequency 
resource, each base station serves only one user Therefore the base station and the user of cell j will also be indexed 
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by j. Without loss of generality, we focus our attention on user 1, equipped with N antennas and hereafter referred 
to as the user or the receiver. Every base station j e {1, . . . , if} is equipped with rij antennas. Similar to previous 
sections, we denote Cj = N/rij, although now this corresponds to the ratio of the number of antennas at the terminal 
user by the number of antennas at the base stations. 

Denote Sj e C"^ the signal transmitted by base station j, y e and n ~ eN(0,cr^Ijv) the signal and 
noise vectors received by the user. We assume uniform power allocation across the base station antennas, so 
that E[sjS^] = Inj. The fading MIMO channel between base station j and the user is denoted Hj e C^^"^. 
Moreover, we assume that H^- is a Kronecker channel, H^- = R^ Xj-T^-, with R^- e C^^^ and T^- e C"^^"^. The 
communication model reads 

K 

" (36) 



y = HiSi + ^ HjSj + n 

where Si is the useful signal (from base station 1) and s,, j > 2, constitute interfering signals. 

2) Single User Decoding: We assume block-fading channels and uniform power allocation across the base station 
antennas. If the receiver considers the signals from the K — 1 interfering transmitters as Gaussian noise with a 
known variance pattern, then base station 1 can transmit with arbitrarily low decoding error at a per-receive antenna 
rate Csu(f given by 



esu(a^) = ^iog 



1 ^ 



1 ^ 



J=2 



(37) 



Assume N and the rii, i G {1, . . . , K}, are large. From Corollary 1, we define the functions m''°(— cr^) as the 
approximated Stieltjes transforms of J2f=i -^j-^j > ^ ^ i^' ^l' point — cr^. 



m 



-(-»')4"(g/T 



+ CkTkeK-a'^) 



where the set of eU—a'^), i G {1, 2}, j G {1,. . . , K}, forms the unique solution with positive entries of 



4(-o=^trR,(x:/Y 



TkdF'^-iTk) 



Rfc + ct^Iat 



(38) 



(39) 



From Theorem 2, we then have 



esu(a2) =^logdet (l^ + ± ^R, / dP'^'^iTk) 



— logdet I In + 



K 

k=2 •' 



+ Ckel{-a'^)Tk 
J2- [\og{l + Ckel{-a^)Tk) dF'^^iTk) 

k—1 

^ 1 Aog (1 + Ckel{-cj^)Tk) dp-^-iTk) 



dP-^-iTk) 



K 



\-a^)-m'^'^{-a^)]+o{l) 



(40) 
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almost surely. 

Similar to the previous section, the result naturally holds also in the ergodic sense, with almost sure convergence 
replaced here by sure convergence. 

3) MMSE Decoding: Achieving rates close to Csu in practice requires to perform multi-stream decoding at 
once at the receiver. A suboptimal linear technique, the MMSE decoder, is often used instead as it allows to treat 
transmit data streams independently, while maximizing the SINR for each data stream at the receiver. In this section, 
we study the performances of the MMSE decoder for correlated transmission channels. We assume block-fading 
channels Hi, ... , H^, which are supposed to be perfectly known at the receiver. 

The conmiunication model in this case reads 

y = Hi (j2 H^H,^ + ^'i^ j (j2 + (41) 

where (E^i HjHH + aH^^ is the MMSE Hnear filter at the receiver. 

This technique makes it possible to transmit data reUably at any rate inferior to the per-antenna MMSE capacity 
Cmmse, defined as 

^ Til 

i=l 

where, denoting hj G C"^ the j^^ column of Hi, Xj the j*^^ column of Xi and ti,. . . ,tni the eigenvalues of Ti, 
we have \/^JR-i Xj = hj and the signal-to-interference plus noise ratio 7i expresses as 



N 



-1 



(43) 



l-hr(Ef=iH,HH + a2I^)"'h, 
= hN ^ H,H^^ - hih^ + a^lN I hi (44) 



= tiX^Rf 1^^^ H,H^" - hih^ + (T^Iiv j R?Xi, (45) 

where Equation (44) is a direct application of Lemma 4. The vectors x,; have i.i.d. complex Gaussian entries 
with variance and the inner matrix of the right-hand side of (45) is independent of x; (since the entries of 

HiHi — hjli^ are independent of the entries h^). Applying Lemma 7, for large, we have 

7i = ^ tr Ri ( V H,H^ - h,hr + a'ljv | + o(l). (46) 



almost surely. 

From Lemma 5, the rank 1 perturbation (— hjh^) does not affect asymptotically the trace in (46). Therefore, in 
the large A'' Umit, we have 



-1 

K 



7i = — tr Ri I ^ H,H^^ + a^I^ 1 + o(l) (47) 
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Correlation matrix 


Eigenvalues 


Ri 


0.00 


0.00 


0.00 


0.00 


0.00 


0.01 


0.60 


7.39 


R2 


0.00 


0.00 


0.00 


0.00 


0.09 


0.97 


3.03 


3.91 


Ti 


0.80 


0.82 


0.92 


1.04 


1.07 


1.09 


1.11 


1.14 


T2 


0.48 


0.52 


0.56 


0.63 


0.79 


1.18 


1.47 


2.37 



TABLE m 

Eigenvalues of correlation matrices for N = m = n2 = S, = 0.5A and if^ = lOA. 



almost surely. 

In the appendix, Equation (91), we prove that the trace in (47) converges almost surely to e^z), defined in (39). 
We therefore finally have the compact expression for Cmmse. 

-, ni 

eMMSE(a2) = _ ^ log (1 + ticieli-a^)) + o(l) (48) 

with ci = N/m and where the convergence is with probability one. 

Taking the expectation over all X/j matrices on the left-hand side of (48), we have that the same result holds for 
the ergodic MMSE decoding capacity. 

V. Simulations and Results 

In the following, we apply the results obtained in Sections IV-A and IV-B to determine the rate region of 
block-fading and time varying multiple acess channels, as well as the capacity of multi-user MIMO with inter-cell 
interference. This section is moreover dedicated to the analysis of the antenna-efficiency of the aforementioned 
connmunication schemes. 

A. Block Fading MAC and BC Rate Regions 

First, we provide simulation results in the context of a two-user multi-access channel, with N antennas at the 
base station and ni = n2 antennas at the user terminals. The antennas are placed in Unear arrays. We further 
assume that both user terminals are physically identical. To model the transmit and receive correlation matrices, we 
consider both the effects of the distance rf^ (resp. dF) between adjacent antennas at the user terminals (resp. at the 
base station) and of the solid angles of effective energy transmission and reception. We assume a channel model 
where signals are transmitted and received isotropically in the vertical direction, but transmitted and received under 
a small angle 7r/6 in the horizontal direction. This simulates the situation where a strong propagation path exists 
in a given direction, while the other paths are strongly attenuated. We then model the entries of the correlation 
matrices from a natural extension of Jakes model with privileged direction of signal departure and arrival. Denoting 
A the transmit signal wavelength, the entry (a, b) of, say matrix Ti, is 

Ti , = / exp 27ri|a - 6| — cos(^) dd (49) 
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Fig. 3. Rate region of two-user MAC, equal power allocation, for N = 8 (top), A'^ = 16 (bottom), ni = 712 = A^, ULA model, antenna 
spacing ^ = 0.5 (dashed) and 4^ = 0.1 (solid), SNR = 20 dB. Deterministic equivalents are given in thick lines. 



23 




with [6'j^i„ , ^maxj the effective horizontal directions of signal propagation. In our case, we consider 0^^^^ = 27r/3, 
eSii^ = 5tt/6, e^Z'J = n, eS'^J - In/e, e'^,^'^ = O, e^^A] = vr/e, 0^^^' = tt/S and = 27r/3, where 

and 0max are the minimum and maximum angles of transmit or receive energy for the correlation matrix Y. 

We first assume the multi-access block-fading channel with the two users described above. We consider N = 
ni ~ n2, SNR = 20 dB, cf- ~ lOA and uniform power allocation. In Figure 3, we compare simulation results 
obtained from 1,000 Monte Carlo simulations to the deterministic equivalent obtained in (29), when N — 8 (top) 
and = 16 (bottom), and for cP- = 0.5A or = O.IA. We first observe that the empirical rate regions show a 
large variance for = 8 compared to = 16. Nonetheless, the deterministic equivalent, even for = 8, is an 
accurate estimate of any of the empirical pentagons. In terms of antenna efficiency, observe in this specific scenario 
that doubling the number of antennas at both communication sides reduces the achievable transmission rate of user 
2 by 25% when = 0.5A (leading therefore to a 150% total throughput gain), and by 33% when d^ = O.IA 
(inducing a 133% total throughput gain). For high correlation, doubling the number of antennas therefore results 
in small rate increase. As for the accuracy of the deterministic equivalent, observe that even for strong transmit 
correlation, the deterministic equivalent is very precise, as claimed in Theorems 1 and 2. 

To confirm the accuracy of the deterministic equivalents, even for strong correlation patterns, we depict in Figure 
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-5 5 10 15 20 25 30 

SNR [dB] 

Fig. 5. Per-antenna achievable sum rate for SNR varying from —5 dB to 30 dB, for different values of — . 

4 the normalized mean square error of the rate region corner points. That is, for 10, 000 Monte Carlo simulations, 
we take the average estimation error of the rates at the corner points provided by the deterministic equivalent, 
normalized by the empirical rates. Observe that, even for low dP^ / A ratios (corresponding to high correlations), the 
estimation error goes very fast to zero, as N increases. The fact that the normalized estimation error is even lower 
for higher correlation is only due to the intrinsic large variations of the empirical rate region observed in Figure 3 
when d^/A is large. In Table III, the eigenvalues of the correlation matrices for N — 8, — 0.5A and (f- = lOA 
are provided (they all sum up to 8). Those values confirm that it is possible to have some eigenvalues almost zero, 
while only a few eigenvalues are large, and still have consistent estimation of the per-antenna rate performance; 
this is in phase with the conditions of Theorem 1 and Theorem 2. Note, as a matter of fact, the importance of the 
angular direction of signal arrival or departure, which, for identical antenna spacings, can lead to very different 
correlation patterns. 

It was observed in Figure 3 that doubling the number of transmit antennas seemed to be an interesting choice for 
low correlation as the antenna efficiency is not much impaired, while higher correlation seemed to reduce antenna 
efficiency as the number of antennas increases. This trend is verified in Figure 5, where the sum-rate of the MAC 
scenario under study is compared when = 8, iV = 16, for varying SNR and varying ratios d^/A. For very 
low correlated antennas, i.e. in the case of nearly i.i.d. channel entries, there is no loss in antenna efficiency by 
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Fig. 6. Ergodic rate region of two-user MAC, uniform power allocation, for N = 2, N = 4 and N = 8, ni = n2 = N, ULA model, antenna 



spacmg 



: 0.5. Comparison between simulations and deterministic equivalents (det. eq. in the figure). 



doubling their number. On the opposite, when the correlation increases, doubling the number of antennas at both 
sides reduces the antenna efficiency. As we assume here equal power allocation across transmit antennas, in which 
case the precoding matrix is deterministic and independent of the channel realization, these conclusions are in phase 
with the conclusions of Goldsmith et al. in [45] and references therein. From Theorem 2, the exact description of 
this phenomenon can be thoroughly analyzed, as a function of the various system parameters involved (as long as 
a Kronecker channel model is assumed). 

B. Time Varying MAC Ergodic Rate Region 

We now move to the analysis of the ergodic rate region of time varying multiple access channels. In Figure 6, we 
provide a comparison between the simulated ergodic MAC rate region and the associated deterministic equivalents, 
for N = ni = 712 varying from 2 to 8, /X = 0.5 and all other parameters are as described above. Uniform 
power allocation is applied. It turns out that, although the N = 2 case is slightly mismatched, for > 4, the 
deterministic equivalent of the ergodic rate region is very accurate. For = 8, the deterministic equivalent is the 
same as that of Figure 3 (top); Figure 6 therefore indicates that the deterministic equivalent in the block-fading 
case is unbiased. In Figure 7, we consider A^ = 8 and provide both deterministic equivalents for d^/A = 0.5, 
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Fig. 7. Ergodic rate region of two-user MAC, equal power allocation (uniform) and rate maximizing power allocation (optimal), for A'^ 



ni = 712 



■ N, ULA model, antenna spacing 



: 0.5 (dashed) and 



: 0.1 (solid), SNR= 20 dB. 



d^/A = 0.1, when optimal power allocation is applied or not. Again, both graphs with uniform power allocation 
correspond to the already presented graphs of Figure 3 (top). We did not provide Monte Carlo simulation results 
here, which were found to match exactly the theoretical curves. As expected [45], it turns out that the stronger the 
correlation patterns the higher the benefits of optimal power allocation. Under optimal power allocation though, it 
is less clear how antenna efficiency evolves as a function of d^/X. This is characterized in the following. 

The antenna efficiency for the ergodic MAC sum rate is provided in Figure 8. When optimal power allocation is 
applied, the per-antenna rate loss incurred by the addition of extra antennas is similar to that observed with uniform 
power allocation policy. Compared to Figure 5 though, we observe that the antenna efficiency does not increase for 
low correlated antennas when optimal power allocation is applied, while the rate achieved when strong correlation is 
present increases significantly. Under the simulation conditions of Figure 8, we therefore conclude that doubling the 
number of antennas on a volume limited device has limited impact on the antenna efficiency whenever d^/A is of 
order 1 or more. We also observe the peculiar behaviour, already noticed in [45], that high correlated transmissions 
may lead to higher rates than low correlated transmission in the low SNR regime. The antenna efficiency is indeed 
shown here to be larger when d^/A — 0.1 than when d^/A = 10 below SNR = dB. Nonetheless, since strong 
correlation induces a large decrease in per-antenna efficiency as the number of antennas increases, the point at low 
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Fig. 8. Achievable per-antenna sum rate for SNR varying from —5 dB to 30 dB, for different values of time varying channels, sum rate 
maximizing power allocation. 

SNR where the performance of strong correlation pattern takes over is pushed further down in the SNR range. 

We therefore conclude that, for a fixed number of antennas, increasing channel correlation helps increasing 
communication rates in the low SNR regime, but that artificially enhancing correlation by adding more antennas 
does not further help. For practical applications in which high correlation and low SNR conditions may often arise, 
carrying a large number of antennas is therefore a choice of limited interest. In this case, a trade-off must be found 
between higher rates in occasional low correlated and high SNR scenarios and lower operating and manufacturing 
cost incurred when embedding a small number of antennas. 

C. Multi-User MIMO 

We now apply Equations (40) and (48) to the downlink of a two-cell network. The capacity analyzed here is 
the per-antenna ergodic achievable rate on the link between base station 1 and a given user, the latter of which is 
interfered by the transmissions from base station 2. The relative power of the interfering signal from base station 
2 is on average F times that of base station 1. Base stations 1 and 2 are equipped with linear arrays of ri i and n2 
antennas, respectively, and the user with a linear array of N antennas. The transmit and receive correlation matrices 
Ti and Rj, i £ {1, 2}, are also modeled thanks to the generalized Jakes model, given in (49); the solid angles of 
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SNR [dB] 

Fig. 9. Ergodic mutual information of point-to-point MIMO for tlie two-cell downlink scenario with single-user decoding, A'^ varying from 4 
to 16, N = n\ = n2, interfering cell with power F = 25%. 



effective energy transmission and reception are the same as in the previous section. Note however, in this downHnk 
scenario, that now roles are interchanged as base stations are transmitters and no longer receivers. 

In Figure 9, we consider the ergodic mutual information of single user decoding and take N — ni = n-z = 4: 
to N — Til = ?^2 = 16, r = 0.25. Uniform power allocation is applied. The distances between transmit antennas 
at the base stations (now transmitters) are (f- ~ lOA, while at the receive terminal, — 0.5A. The SNR ranges 
from —5 dB to 30 dB. We observe here that Monte-Carlo simulations perfectly match the deterministic equivalent 
obtained in (40), already for iV = n = 4. As it turns out, doubUng the number of antennas in this scenario does not 
significantly reduce the antenna efficiency, even with strong correlation at the receiver When performing single-user 
decoding, it is therefore an appropriate choice to increase the number of antennas at both communication ends, as 
long as the processing costs incurred are not dramatically increased. 

In Figure 10, with the same assumptions as previously, we analyze the ergodic capacity of the MMSE decoding 
strategy. Here, the deterministic equivalent is only accurate for iV > 8. We observe a significant difference in 
performance between the single-user and the suboptimal linear MMSE decoders, especially in the high SNR region, 
where the MMSE decoder performance no longer grows linearly with the SNR. Also, additional antennas bring 
marginal capacity gain, as their efficiency reduces rapidly with larger N. The comparison to Figure 9 suggests that 
additional antennas can be used much more efficiently by simultaneous stream decoding methods than by reinforcing 
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Fig. 10. Ergodic mutual information of point-to-point MIMO in two-cell downlink, MMSE decoding, N varying from 4 to 16, N = n\ = 712, 
interfering cell with power F = 25%. 



the MMSE decoding method with more antennas. Compromises in the decoding strategy might therefore be thought 
of when dealing with inter-cell interference. It is in particular known that single-user decoding can be achieved 
by performing successive MMSE decoding. If many antennas are available at the receiver side, several MMSE 
decoding steps are therefore expected to lead to strong performance increase compared to the single-step MMSE 
decoder. 

Remark 2: It must be stressed that some scenarios show deterministic equivalent plots that do not converge 
as rapidly to the simulated plots as those presented in this section. The following misleading effect especially 
happens. As and the grow at the same rate, the per antenna rate performance usually decreases (as observed 
in all situations here), so that the Monte Carlo simulated capacity values decrease with large N. In parallel, the 
deterministic equivalents also decrease. Now, in the case of high correlation both at the transmit and receive 
sides, it often turns out that the convergence of the deterministic equivalent to the simulated capacity is very 
slow. The resulting effect is that, for moderately large N, the difference between the simulated performance and 
its corresponding deterministic equivalent decreases slowly while both curves decrease rapidly to zero; therefore, 
the approximation error relative to the exact capacity value increases with N, although the absolute error slowly 
decreases. For instance, if both simulation plots and deterministic equivalents decrease as O^i/N"^), while their 
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difference decreases as 0{1/N), then for moderate A'' the relative difference is of order 0{N). This effect is very 
unfortunate as it leads to plots where the simulated results are sometimes ten times larger than their deterministic 
equivalent, although this effect does not invalidated Theorem 1 and Theorem 2. 

Note that the aforementioned problem is less accentuated when the correlation matrices have uniformly bounded 
spectral norms, as noticed in [26] for Rician channels, as the convergence of the deterministic equivalent is of order 
0{1/N'^), while here the fastest convergence rate that we proved we could achieve is of order o(l/log'^ N) for 
any k (see proof of Theorem 1 in Appendix A). 

VI. Conclusion 

In this contribution, we analyzed the per-antenna rate performance of a family of multi-antenna communication 
schemes including multiple cells and multiple users, while taking into account the correlation effects due to close 
antennas and reduced soUd angles of energy transmission. We specifically studied the rate regions of block-fading 
MAC and BC channels, the rate region of the time varying MAC channel, as well as the uplink and downlink capacity 
of multi-cell networks with inter-cell interference. Our main results stem from novel deterministic equivalents of 
the Stieltjes transform and of the Shannon transform of a certain type of large dimensional random matrices. Based 
on these new tools, an accurate analysis of the effects of antenna correlation can be directly translated into the 
antenna efficiency of multi-user multi-cellular systems. It especially turned out that, for the same conmiunication 
scheme, some decoding strategies suffer strongly from an increase in channel correlation, while others do not. This 
suggests that the trade-off between throughput gain of additional antennas and limited incurred processing cost 
strongly depends on the decoding strategy. 
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Appendix A 
Proof of Theorem 1 

Proof: For ease of read, the proof will be divided into several sections. 
We first consider the case K = 1, whose generahzation to > 1 is given in Appendix A-E. Therefore, in the 
coming sections, we drop the useless indexes. 

A. Truncation and centralization 

We begin with the truncation and centrahzation steps which wiU replace X, R and T by matrices with bounded 
entries, more suitable for analysis; the difference of the Stieltjes transforms of the original and new B jv converging to 
zero. Since vague convergence of distribution functions is equivalent to the convergence of their Stieltjes transforms, 
it is sufficient to show the original and new empirical distribution functions of the eigenvalues approach each other 
almost surely in the space of subprobabihty measures on R with respect to the topology which yields vague 
convergence. 

Let Xij = i^^j — E(X,jl||^. and X = (^-^Xij^. Then, from c). Lemma 1 and a). Lemma 

3, it follows exactly as in the initial truncation and centrahzation steps in [6] and [18] (which provide more details 
in their appendices), that 

I^Bn _ ^S+RixTX"R5 I ^ ^^^^ 

as — )■ oo. 

Let now Xij — Xij ■ l{\Xij\<\nN} ~ ^i^ij^I{\Xij\<\n n}) X ~ (^^Xij^. This is the final truncation and 
centralization step, which will be practically handled the same way as in [6], which some minor modifications, 
given presently. 

For any Hermitian non-negative definite r x r matrix A, let Xf- denote its i-th smallest eigenvalue of A. With 
A = U diag(Af^, . . . , A^)U^ its spectral decomposition, let for any a > 

A« = Udiag(Af 1{;,A<„}, . . . , Af^l{A,<a})U" (51) 

Then for any N x N matrix Q, we get from 1) and 2), Lemma 3, 



II^s+r^qtqHr^ _ pS+ni-QT-Q-ni- y < lrank(R^ - R^") + -irank(T - T") 

i=l i=l 



(52) 
(53) 



= 2F^((a, oo)) + — F^((a, oo)) (54) 

Cat 

Therefore, from the assumptions 4) and 6) in Theorem 1, we have for any sequence {a^} with ajv — oo 

||^S+R^QTQ"R^ _ ^S+R^°"QT°«qHr,^°« ^ _^ ^ ^^^^ 

as A'' — )• OO. 
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A metric D on probability measures defined on M, which induces the topology of vague convergence, is introduced 
in [6] to handle the last truncation step. The matrices studied in [6] are essentially Bjv with R = Ijv. Following 
the steps beginning at (3.4) in [6], we see in our case that when ajv is chosen so that as N ^ oo, aj\f t oo, 

a%{E\Xl^l{x,,\>inN} + N-') -> (56) 

OO IQ 

^^(E|Xn|%|^„l<^)j + l)<oo (57) 

JV=1 



and 



We will get 



as ^ OO. 

Since EjXnp ^ 1 as TV ^ oo we can rescale and replace X with X/VEj-'fiip, whose components are 
bounded by klnN for some k > 2. Let logiV denote logarithm of with base e^/*^ (so that k\nN = logA^). 
Therefore, from (55) and (58) we can assume that for each N the Xij are i.i.d., EXu = 0, EjXiip = 1, and 
<logA^. 

Later on the proof will require a restricted growth rate on both ||R|| and ||T||. We see from (55) that we can 
also assume 

max(||R||,||T||)<log7V (59) 

B. Deterministic approximation of mN{z) 

Write X = [xi, . . . , x„], Xj e C-'^ and let — (l/^/n)R^Xj. Then we can write 

n 

B;v = S + ^r,y,y^^ (60) 

We assume z e C+ and let v = Define 

Civ = ejv(^) = (1/iV) trR(Bjv - ^Ijv)"' (61) 

and 

PN = --Y. T-^ = / ^T— ^ (62) 

Write Bat = OAO^, A = diag(Ai, . . . , Ajv), its spectral decomposition. Let R = {i?^ } = O^RO. Then 

BN = {l/N) tr R(A - zIn)-^ = {l/N) ^ (63) 

1 \ z 

We therefore see that is the Stieltjes transform of a measure on the nonnegative reals with total mass (1 /N) tr R. 
It follows that both eAr(z) and ze^iz) map C+ into C+. This implies that pn{z) and zpn{z) map C+ into C^" 
and, as z ^ oo, zppf{z) — (1/n) trT. Therefore, from Lemma 6, we also have the Stieltjes transform of a 
measure on the nonnegative reals with total mass (1/n) trT. From (59), it follows that 

|ejv| < t;"MogiV (64) 
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and 



(1 + cntcn) 

More generally, from Lemma 6, any function of the form 



= \zpn{z)\ < l^lv-^logiV 



— r 



(65) 



(66) 



z{l + m{z)) 

where r > and m{z) is the Stieltjes transform of a finite measure on R+, is the Stieltjes transform of a measure 
on the normegative reals with total mass r. It follows that 

T 



< t\z\v 



l + m{z) 

Fix now z e C+. Let B(j) = B/^ — Tjyjy^. Define D = —zIn + S — zpNiz)^. We write 

n 

Bjv - zIn - D = ^ T^y^Ti + zpn'R 
Taking inverses and using Lemma 4 we have 

n 

(Bjv - ^Ijv)"^ - = ^ T,D- VjTi (Biv - ^Iat)"^ + 2PivD-iR(Bjv - ^Ijv)"^ 

" _ D-V,yH(B(,)-.I^)-i 

^ 'l + T,yH(B(,)-zI;v)-iyi 

Taking traces and dividing by N, we have 

1 1 " 

— trD-i - miv(2;) = - Xl^J^J = 



where 



(l/7V)x^^R5(B(,-) - ^I;v)-^D-iRix,- (l/7V)trR(BAr - zIn)-^T>-^ 



1 + T7yj (B(j) - 2;Ijv)~^yj 1 + cnTjCn 

Multiplying both sides of the above matrix identity by R, and then taking traces and dividing by N, we find 



1 1 " 

-trD-iR-e^(z) = -^r,d| 



w 



N 



3 = 1 



where 



(l/iV)xHR5(B(j) - 0l^)-iRD-iR^x, (l/iV)trR(Bw - zlAr)^iRD-i 



l + T,y^H(B(,) -zlw)-iy, 
We then show that, for any k > 0, almost surely 



1 + CNTjCN 



and 



lim (log'' N)w'S = 

N—hoo 



lim (log'= N)w% = 



(67) 

(68) 

(69) 
(70) 

(71) 
(72) 

(73) 
(74) 

(75) 
(76) 



Notice that for each j, y^(B(j) — zIn) ^yj can be viewed as the Stieltjes transform of a measure on 
Therefore from (67) we have 



i + Tjyi(Bo) -^liv)-iy,- 



V 



ill) 
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For each j, let e(j) = e(j)(^) = (1/A'') tr R(B(j) — zIn) ^, and 



— T 



Z{1 + CNTe(^j)) 

both being Stieltjes transforms of measures on IR+, along with the integrand for each r. 
Using Lemma 4, Equations (59) and (67), we have 



(78) 



\zpN - zP(j) \ = I Gat - e(j)|Civ 



(1 + CNTepf){l + cntc,,)) 



< 



(79) 



Let Dq) — — zljv + S — zp(j)(z)R. Notice that (Bat — zIn)^^ and (B(j) — z1n)~^ are bounded in spectral 
norm by v^^ and, from Lemma 8, the same holds true for D^^ and D^r^. 

In order to handle both w'^, dj and w^, (i| at the same time, we shall denote by E either T or Ijv, and wn, 
dj for now will denote either the original w^, dj or w^, dj. 

Write dn = d] + d'j + d^, + d% where 



^ _ (l/iV)xHR5(B(,) - zI^v)"iED-iR3x,- (l/iV)xHRi (B(,) - 0lAr)-iED-^.iR5x,- 



dj = 



d- = 



l + Tjy^{B(j^-zlN)-^yj 



l + r,y^^(B(,) -zlAr)-iy,- 



(1/7V)xHR5(B(,) - zlAr)-iED(^.jR5x,- (1/A^) tr R(B(,) - zl 



^ED 



0) 



l + r,y"(B(,) -zI^^)-ly, 



l+TJy^(B(,) -zlAr)-iy, 



(1/7V) tr R(B(,) - zIjv)-iED(^.^ (j/jy) tr R(Bjv - zI^)-iED-i 

1 + r,yH (B(,) - zlN)-^yj 1 + T,yj^ (B(,) - zlN)-^yj 

il/N) tr R(Bjv - ^lAr)-^ED-i (l/A^) tr R(BAf - ^Ijv)-^ED-i 



l + r,y^H(B(,)-^lAr)-iy,- 
From Lemma 4, Equations (59), (77) and (79), we have 

^_ || 2 CAT log^iV|0|3 



1 + CNTjCN 



T,-M]|<-||X„ 



-ilogiV 
N 



T,|4|< 



\z\\oi'N f 1 , CN\z\^log^N 

vN 



'{iY 
0, as n ^ 00 



x5^R^(B(,) - ^lAf)-iED-.iR5x,- - trR(B(,) - zlNY^^Ti-}^ 
+ 

-.•141 < ^^^^S^ f|x.^R^(B0) - .Ia.)-R^x, - trRi(Bo, - ^M^R^I + 



From Lemma 7, there exists ii' > such that. 



E|;^||x,|p-l|'^<i^iV-^WAr 



E-ig |x^"R5(B(,.) - ^lAf)-iED-.;R5x,- - trR(B(,.) - zIn)-^^-Di]\^ < log^^ 



'0') 



N 



E-^|Xj"R^(B(j) - zlAr)-iR^Xj - trR3(BQ-) - zIa,)-1R5|6 < i^iV-^^;-^ log^^ jv 



(80) 

(81) 
(82) 
(83) 

(84) 
(85) 
(86) 

(87) 

(88) 
(89) 
(90) 



All three moments when multiplied by n times any power of logA?^, are summable. Applying standard argu- 
ments using the Borel-Cantelh lemma and Boole's inequality (on 4n events), we conclude that, for any A; > 
m£ixj<„ Tjdj ^4 as AT — )• oo. Hence Equations (75) and (76). 
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C. Existence and uniqueness ofm'^{z) 

We show now that for any A'', n, S, R, N x N nonnegative definite and T = diag(Ti, . . . , tjv), Tk > for all 
1 < k < N, there exists a unique e with positive imaginary part for which 



e=ltr(S + 



It 



-dF^ir) 



R — zljv 



R 



(91) 



For existence we consider the subsequences {Nj}, {rij} with Nj = jN, nj = jn, so that cjv^ remains cjv, form 
the block diagonal matrices 



both jN X jN and 



Rjv, = diag(R, R, . . . , R), Sjv, = diag(S, S, . . . , S) 
:diag(T,T,...,T) 



(92) 



(93) 



of size jn x jn. 

We see that F^^'o = F'^ and the right side of (91) remains unchanged for all Nj. Consider a reahzation where 
as j ^ oo. We have |eAr^.(z)| = |(j-A/')~^ tr R(Bjjv — zT-n)~^\ < v~^logN, remaining bounded as 



JV,- 



j — )• oo. Consider then a subsequence for which converges to, say, e. From (67), we see that 

T 

< t\z\v 



1 + CnTCn^ 

so that from the dominated convergence theorem we have 

T 

1 + cntcn^ (z) 
along this subsequence. Therefore e solves (91) 



-dF^ir) 



1 + cjvre 



dF^ir) 



(94) 



(95) 



We now show uniqueness. Let e be a solution to (91) and let 62 = Si[e]. Recalling the definition of D we write 



e = — tr I D-iRD-" ( S 



:dF'^{r) 



R-z*! 



(96) 



1 + CnTB* 

We see that since both R and S are Hermitian nonnegative definite, tr (D~^RD~^S) is real and nonnegative. 
Therefore we can write 



R + t;I 



JV 



= e2a + v/3 



where we denoted 



a=-tr (^D-iR(DH)-i 
/3=ltr(D-iR(D")-i) 



11 + cjvreP 



dF'^ir) 



R 



(97) 

(98) 
(99) 



Let e be another solution to (91), with 63 = 9[e], and analogously we can write 63 = + v(3. Let D denote 
D with e replaced by e. Then we have e — e = j{e — e) where 



7 



+ cjvre)(l + cjvre) 



dF^ir) 



trD-iRD-^R 

N 



(100) 
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If R is the zero matrix, then 7 = 0, and e = e would follow. For R ^ we use Cauchy-Schwarz to find 



< 



cnt^ 



,trD-iR(D^^)-iR 



dF^ir) 



N 



cnt^ 



,trD-'R(D^)-iR 



+ Cjvrel 



N 



2_ 1 



(101) 
(102) 
(103) 



Necessarily /3 and /3 are positive since R ^ 0. Therefore |7| < 1 so we must have e = e. 

For 2 < and e > 0, the same calculus can be performed, with 7 remaining the same. The step (97) is changed 
by evaluating e, instead of 62, using the same technique. We obtain the same a while fi is replaced by another 
positive scalar. We therefore still have that 7 < 1. 



D. Termination of the proof 

Let denote the solution to (91). We show now for any ^ > 0, almost surely 



lim log*Ar(eAr-e^) = 

JV— >cx) 



(104) 



Let el = 9[e?^], and a° = a°^, 
and (67), 



be the values as above for which = e^oP -\- vj3° . We have, using (59) 



= -logiV9 



1 + CNTe% 



< \og^N\z\v-'^ 



Therefore 



a 



< 



log^ N\z\ 



^v"^ +\og^ N\z\ 

Let D°, D denote D as above with e replaced by, respectively and ejv- We have 



ejv = -^trD ^'R-w% 



With 62 = 9[eAr] we write as above 



62 



lt,(D-.RD-"([/ 



620; + v/3 — Sw*^ 



R + t;Ij> 



N 



(105) 
(106) 
(107) 

(108) 
(109) 

(110) 

(111) 

(112) 

(113) 
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We have as above ejv — = 7(ejv — e^) + w% where now 

111 <a°5a5 (114) 

Fix an £ > and consider a reahzation for which log^ N 0, where £' = ma.x{£ +1,4) and n large enough 

so that 

\w%\ < 5— (115) 

2 

Suppose P < 4e^|^|2ioga jv • Thsn by Equations (59) and (67) we get 

a < cjvt;"^|^|^log3iV^ < 1/4 (116) 
which implies |7| < 1/2. Otherwise we get from (108) and (115) 

b\<«'i( T^,,..., V (117) 



< I , I (118) 



Therefore for all N large 



e2a + vP - ^[w%i 

y 

^2 +logAf|2:| 



log^A^|e.-e^|< . (119) 

, _ / log"Af|z| \ 2 
^ +log^ N\z\ J 

< 2v-'^ {v'^ +log^ N\z\){\og^ N)w% (120) 
-^0 (121) 



as n — >^ 00. Therefore (104) follows. 
Let m% = N-^ tr D°. We finaUy show 



mN-m%^Q (122) 



as n ^ oo. Since tun = N ^ tr D ^ — w'jj, we have 

mjv - = 7(eiv - e?^^) - (123) 

where now 

^ y (l + c;vre;v)(l + c^re^)'''^ N 
From (59) and (67) we get |7| < cn\z\'^v-'^ log^ N. Therefore, from (75) and (104), we get (122). 

Returning to the original assumptions on Xu, T, and R, for each of a countably infinite collection of z with 
positive imaginary part, possessing a cluster point with positive imaginary part, we have (122). Therefore, by Vitah's 
convergence theorem, page 168 of [15], for any e > we have with probability one mN{z)—rn°j^{z) uniformly 
in any region of C bounded by a contour interior to 

C \ ({^; : 1^1 < e} U = a; + w : a; > 0, |t;| < £» (125) 



38 



If S = /(R), meaning the eigenvalues of R are changed via / in the spectral decomposition of R, then we have 

1 



f{r)+rj 



r 



f{r) + rj 



dF^{r) 



(126) 
(127) 



E. Extension to K >1 
Suppose now 



K 



Bjv = S + ^ R| XfcTfcX^R^ 



(128) 



fe=i 



where K remains fixed, is N x Uk satisfying 1, the X^'s are independent, satisfies 2) and 4), is 

rife X rife satisfying 3) and 4), Cfe = N/uk satisfies 6), and S satisfies 5). After truncation and centralization we 

may assume the same condition on the entries of the Xfe's, and the spectral norms of the R^'s and the Tfe's. Write 
j_ 

Yfcj = {^1 \/nk)^k'^k,3, with Xfej- denoting the j-th colunon of Xfe, and let Tk,j denote the j-th diagonal element 
of Tfe. Then we can write 

" (129) 



K rtfc 

B;V = S + ^ ^ TkjykjYk,^ 

fe=l j=l 



Define 



and 



CAT.fe = eAr,fe(z) = (l/7V)trRfe(BAr - zIat) ^ 



=-—y- 

rikZ ^ 1 

-It 



(130) 



(131) 



-dP'^'iTk) 



(132) 

+ CkTkeN,k 

We see ejv.fe and pk have the same properties as ejv and pn- Let ^k,{j) = — '''kjYkjYkj- Define D = 
-zIn + S - ^k=i ^Pk{z)'R.k- We write 

K / rik 

Bjv - zIn - D = ^ ^'^kjYkjYkj + zPk{z)'B.k 



(133) 



fe=i 



Taking inverses and using Lemma 4, we have 



D-i - (B;v - ^Ijv)-^ = J2 ~ ^^^)"^ + ^ffeD-^Rfe(B;v - ^Ijv)"^ I (134) 

fc=i \j=i 



K I Uk 



D-Vfe,iyfej(Bfe,o) -zIat)-! 



2^ Tfe J— p—^ —— + zpkT) RfelBjv - zIn) 



Taking traces and dividing by N, we have 

(l/iV)trD~^ -niN^z) = ^ — ^'^kjdkj = w'^ 



fc=l j = l 



(135) 



(136) 
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where 



dk,j = 



(V^X.Rfe (Bfe,o-) - zIyv)-iD-iR|xfe,,- (l/7V)trRfe(BA. - zl 



N 



ID 



(137) 



For a fixed fee {1, . . . , K}, we multiply the above matrix identity by R^, take traces and divide by N. Thus we 
get 



(l/N) tr D-iRfc - Bkiz) = E ^ E ^'^'^'^kkj 



where 



(l/iV)x^^^.R|(Bfc,(,) - zlAr)-iRfeD-iR^Xfe,, (1/7V) tr Rfc(B^ - zl 



N 



'RfcD- 



(138) 



(139) 



1 + TKiYkj (Bfc.o) - 2^Iiv)-^yfe,j 1 + CkTkjeN,k 

In exactly the same way as in the case with K = 1 we find that for any nonnegative £, log^ Nw^ and the 
log^wf's converge almost surely to zero. By considering block diagonal matrices as before with N, n^'s, S, Rj's 
and Tj's aU fixed we find that there exist e^, . . . , with positive imaginary parts for which for each i 

K 



ItrRjs + y / ^_dFT.( 



Rfc-2^1 



N 



(140) 



+ CkTCl 

Let us verify uniqueness. Let e° = (e^, . . . , e^)^, and let D° denote the matrix in (140) whose inverse is taken 
(essentially D after the ejv,i's are replaced by the e°'s). Let for each j, e° 2 — and 63 = (e° 2, • • • , 2)^- Then, 
noticing that for each i, tr SD°~'^RiD°^^ is real and nonnegative (positive whenever S 7^ 0) and tr D°^'^RjD°~^ 
and tr RjD°^'^RiD°^^ are real and positive for all i, j, we have 

K 



k=l 



H,2 



Ri - z*l I D°""RiD°"^ 



Ee°2^trR,D°-"R,D°-c,y ^^^^dF^^r) + ^ tr D°-"R,D°-^ 



Let C° = (4), b° = {hi, b%y, where 



4 = -trR,D°-"R,D 



0-1, 



|l + c,-re°|2 



and 



Therefore we have 62 satisfies 



6° = -^trD°-^RiD°-^ 



en = C en -\-vo 



(141) 



(142) 



(143) 



(144) 



(145) 



We see that each e° 2, c°j, and b° are positive. Therefore, from Lemma 9 we have p(C°) < 1. 
Let e° = (e°, . . . , e^)^ be another solution to (140), with Cj, D°, C° = {c°j), b° defined analogously, so that 
(145) holds and p(C°) < 1. We have for each i, 



-e° = ltrR,D°-^E(e°-e°)c, /- 



Cj7-e°)(l + CjTe;-; 



(146) 
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Thus with A = (ay) where 

a,, = i^trR,D°-R,D°-c,/^^^^-^^I^^ (147) 

we have 

e°-e° = A{e° - e°) (148) 

which means, if e° ^ e°, then A has an eigenvalue equal to 1. 
Applying Cauchy-Schwarz we have 



|a,|< (1r.D"-R,D"-h/ ^^^.F-.(.))7^R.D°-R,D°-H/ j^^^dF^^ir) 



(149) 

= c°//'c°//' (150) 
Therefore from Lemmas 10 and 11 we get 

p{A) < p(c°ic°i) < p{C°)hiC°)i < 1 (151) 

a contradiction to the statement A has an eigenvalue equal to 1 . Consequently we have e = e. 

The same reasoning can be applied to z < 0, with e° > 0. In this case matrix A remains the same. The step 
(141) is now replaced by taking e°, instead of its imaginary part, using the same line of reasoning. This leads to the 
same matrix C° with (145) remaining true with b° replaced by another positive vector. The conclusion p(A) < 1 
therefore remains. 

Let ejv = (ejv.i, ■ • ■ ,eN,K)~'' and = (e^ ■ ■ ■ j^°n k) denote the vector solution to (140) for each N. We 
will show for any £ > 0, almost surely 

lim log^ iV(ejv - e%) ^ (152) 

AT— >cx) 



We have 



= (1 tr RiD°-\ . . . , 1 tr RkD"-^)^ (153) 



Let = = —{wf, . . . , w^) . Then we can write 



ejv = (-^trRiD-\...,-itrRKD-i)T + - 



(154) 



Therefore 

eN-e% = A{N){eN-e%)+w^ (155) 



where A(A;') = {aij{N)) with 

a„(iV) (1 ) -'^-'(^) 



(156) 
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We let e^ 2' b°j{N), C°{N), b^ ^, and b^, denote the quantities from above, reflecting now their dependence on 
N. Let C{N) = {cij{N)) he K x K with 

C,(iV) = ltrR,D-HR.D-c,/ j^^^^^^dF^^ {r) (157) 

Let ejv,2 = S^[ejv] and = S>[w^]. Define = {bN,i, ■ ■ ■ , ^iv,if)^ with 

6;v,i = tr D-"RiD-i (158) 

Then, as above we find that 

eA,,2 = C(7V)eAr,2 + ^;bjv + w^ (159) 
Using (59) and (67) we see there exists a constant Ki > for which 

4(^)<i^ilog'^6^,. (160) 

and 

Cij{N) < Kilog^ NbN,i (161) 
c,jiN) <Ki\og^N (162) 

for each Therefore, from (145) we see there exists > for which 

e^^iKKlog'Nvb^^i (163) 

Let X be such that is a left eigenvector of C°{N) corresponding to eigenvalue p{C°{N)), guaranteed by Lemma 
12. Then from (159) we have 

xTe^_2 = P(C°(iV))xTe^_2 + i^x^b^ (164) 

Using (164) we have 

l-p(C°(iV)) = ^^^^>(^log^7V)-i (165) 
X e;Y_2 

Fix an ^ > and consider a realization for which log^"'"^"'"^ Nmv% 0, as N ^ oo, where p > 12K — 7. We 
wiU show for aU N large 

p{C{N)) < 1 + log^ A^)-^ (166) 

For each N we rearrange the entries of e^r 2, vh„i + w!;, and C(n) depending on whether the i*^ entry of 
vhm + w| is greater than, or less than or equal to zero. We can therefore assume 

^Cii(7V) Ci2(iV)^ 

where Cii(A'') is fci x fci, C22(-^) is ^2 x ^2, Ci2(A'^) is fci x k2, and C2i(A'^) is ^2 x fci. From Lenmia 9 we have 
p(Cii(iV)) < 1. If vbN,i + yf2,i < 0, then necessarily vbN,i < |w^| < Ki{\ogn)-^^+P\ and so from (163) we 
have the entries of C2i(A'') and C22(-^) bounded by ii'i(log A'')"^. We may assume for all A'' large Q <k\ < K, 
since otherwise we would have p{C{N)) < 1. 



(167) 
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We seek an expression for det(C(A'^) — AIjv) in which Lemma 14 can be used. We consider N large enough so 
that, for |A| > 1/2, we have (C22(-^) — AIjv)~^ existing with entries uniformly bounded. We have 

I -Ci2(iV)(C22(^)-AI)-i\ /Cii(A^)-AI Ci2(A^) 

,0 I / \ C2i(A^) C22(Ar)-Ar 



det{C{N) - AI) = det 



(168) 



^^^^ /Cii(iV)-AI-Ci2W(C22(iV)-AI)-iC2i(iV) ^^^^^ 

\ C2l(iV) C22(iV)-AI^ 

= det(Cii(iV) - AI - Ci2(iV)(C22(^) - AI)-iC2i(7V))det(C22(A^) - AI) (170) 
We see then that for A = p{C{N)) real and greater than 1, 

det(Cii(iV) - AI - Ci2(A^)(C22(iV) - AI)-^C2i(iV)) (171) 

must be zero. 

Notice that from (163), the entries of Ci2(A/')(C22(-/V) — XI)~^C2i{N) can be made smaller than any negative 
power of log AT for p sufficiently large. Notice also that the diagonal elements of Cii(A'^) are all less than 1. From 
this. Lemma 13 and (163), we see that p{C{N)) < Ki log^ N. The determinant in (171) can be written as 

det(Cii(7V)-AI)+5(A) (172) 

Where g{X) is a sum of products, each containing at least one entry from Ci2(A^)(C22(VV) — AI)~^C2i(iV). Again, 
from (163) we see that for all |A| > 1/2, g{\) can be made smaller than any negative power of logiV by making 
p sufficiently large. Choose p so that |.9(A)| < [KlogN)^'^^^ for these A. It is clear that any p > 8ki + 4 will 
suffice. Let Ai, . . . , A^^ denote the eigenvalues of Cn. Since p(Cii) < 1, we see that for |A| > {KlogN)~'^, we 
have 

|det(Cii(Ar)-AI)| = in(Ai-A)| (173) 

> {KlogN)-^''^ (174) 

Thus with /(A) = det(Cii(A'') — AI), a polynomial, and ^(A) being a rational function, we have the conditions of 
Lenmia 14 being met on any rectangle C, with vertical lines going through {{KlogN)~'^, 0) and {Ki{logN)^, 0). 
Therefore, since /(A) has no zeros inside C, neither does det(C(A'') — AI). Thus we get (166). 
As before we see that 

\aijm < c]i\N)c°y\N) (175) 
Therefore, from (165), (166), and Lemmas 10 and 11, we have for all N large 

For these N we have then I — A(iV) invertible, and so 

ejv-eO =(I-A(Ar))-iw« (177) 
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By (59) and (67) we have the entries of A(A'^) bounded by Ki log* N. Notice also, from (176) 

^2iog8 7v|^l + :^^^0_l^ j >{2KHog'N)-^ (178) 

When considering the inverse of a square matrix in terms of its adjoint divided by its determinant, we see that 
the entries of (I — A(A^))~^ are bounded by 

Therefore, since p > 12K—7 (> 8fci +4), (152) follows on this reahzation, an event which occurs with probability 
one. 

Letting = tr D°~^, we have 

mjv - m% = i^{eN - e^) (180) 

where 7 = (71 , . . . , 7^^ )^ with 

^. = f ^pT.. . trD-^R,-D°-^ 

J {l + CNre^,j){l + c^re%y^ N ^'^'^ 

From (59) and (67) we get each < CAr|zpw^^ log^ A^. Therefore from (152) and the fact that w'^ — >■ 0, we 
have 

mN-m%^0 (182) 

almost surely, as AT ^ 00. 

This completes the proof. ■ 

Appendix B 
Proof of Theorem 2 

We first prove that V°j^{x) as defined in Equation (24) verifies 

V^(x) = (J^ - m%{-w)j dw (183) 
and then we prove that, under the conditions of Theorem 2, V° (x) defined as such verifies 

V°r,{x) - Vn{x) ^ (184) 

A. Proof of (183) 

First, observe that we can rewrite ei{z) under the symmetric form. 



fe=i 

Si{z) = — tr Ti {-z [In, + Ciei{z)Ti])~'^ 



K 



(185) 
(186) 
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and then for m%{z). 



Now, notice that 



m%{z) = —til-z 



K 



fc=i 



\ - m?^(-2;) = ^ ( (zl) ^ - 1 z 



K 



fe=i 



(187) 



(188) 



(189) 



/c=l 



Since the Shannon transform V{x) satisfies V{x) = [w ^ — mN{—w)]dw, we need to find an integral form 
for X^fcLi 5k{—z) ■ eki—z). Notice now that 

1 ^ ^ X ^ 



— — logdet ( In + ^6k{-z)Iik ) = -2;^efc(-2;) • ^(-z) 



fc=i 



fc=i 



logdet {I„^ +Ckek{-z)Tk) = -z-efe(-z) • 4(-2;) 



(190) 



K 



K 



K 



dz 



: ^ ^fc(-0)efc(-z) = ^ 5k{-z)ek{-z) - ^ • efc(-0) + 5k{-z) ■ e'k{-z) (191) 



fe=i / fe=i 

Combining the last three lines, we have 

K 

^5k{-z)ek{-z) = 

fe=i 

d_ 

dz 



fe=i 



^ / K \ K ^ K 

— logdet I In + ^Sk{-z)'Rk 1 - ^ — log det + Ckek{-z)Tk) + z^5k{-z)ek{-z) 



k=l 

which after integration leads to 

*>+oo 



fe=l 



fe=l 



(192) 



— logdet I Ijv + X] ^k{-z)^k logdet (I„, + Cfcefc(-z)Tfc) - X) ^k{-z)ek{-z) (193) 

\ /s=l / fe=l /s=l 



which is exactly the right-hand side of (24). 



B. Proof of (184) 

Consider now the existence of a nonrandom a and for each A?^ a non-negative integer rjv for which 

niiKmax(A^;_^i,A^;+i) < a 
(eigenvalues also arranged in non-increasing order). Then for each i 

<«'l|x.xri| 



(194) 

(195) 
(196) 
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and then we have, from Lemma 15, 

>^2Kr.+i < a'(||XiXH|| + ... + ||X;,X^|1) (197) 

We can in fact consider that the spectral norms of the Xj are bounded in the Umit. Either Gaussian assumptions 
on the components, or finite fourth moment, but all coming from doubly infinite arrays (remember though that we 
need the right-unitary invariance structure of Xj). Because of assumption 5 in Corollary 1, we can, by enlarging 
the sample space, assume each Xj is embedded in an A?" x matrix X^, where N/n'^ a as N ^ oo. Then, 
with probabihty one (see e.g. [34]), 

limsup A^^J'.^+i < limsupa2(||X;x;"|| + • • • + ||X'^X'^"||) 

N N 

<a'^K{b/a){l + y/af (198) 

Let a° be any real greater than a'^K{b/a){l + \/af. 

Since S = here, it follows as in [6] that {F^"} is almost surely tight. Let denote the distribution function 
having Stieltjes transform m^, and let / on [0, oo) be a continuous function. Then the function 

f fix) , x<a° 
fa^ix)={ ' ' - (199) 
[ /(a°) , x>a° 

is bounded and continuous. Therefore, with probability 1, 

fao {x)dF^" (x) - / fa" (x)dF^ (x) ^ (200) 



as A'' — >■ oo. 

Suppose now rjv = o{N). Then, since almost surely there are at most 2KrN eigenvalues greater than a° for all 
A'' large, any converging subsequence of {F^} must have some mass lying on [0, a°]. This impUes, with probability 
1, 

if E /(^^)- / mdF°^{x)^0 (201) 

as N ^ OO. 

Let be a bound on the spectral norms of the Tj and Rj. Then 

||B„|| < 6|,(||x;x;"|| + • • • + iix'^x'^^ii) (202) 

Fix a number (3 > K{b/a){l + \/a)^, and let ajv = b%p. Suppose also that / is increasing and that f{aN)rN = 
o{N). Then 

f{x)dF^-{x) /(^') ^ (203) 

Xi<a° 



I 



almost surely, as A'' ^ oo. Therefore, with probabihty 1, 



[ f{x)dF^^{x)- [ f{x)dF^{x)^ (204) 



as A^ — OO. 
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For any N we consider, for j = 1,2, . . ., the jN x jN matrix Bjv.j formed, as before, from block diagonal 
matrices and jN x jrii matrices of i.i.d. variables. Then with probabiUty 1, F^^'^ converges weakly to as 
j ^ oo. Properties on the eigenvalues of Bjv,j will thus yield properties of F^. 

By considering the bound on ||B„_j|| analogous to (202), we must have F^(aAr) = 1 for all AT large. 

Similar to (198) we see that, with probabiUty 1 

limsupA^^/^„+i < a2((l + y^f + ... + (! + (205) 
j 

this latter number being less than a° for all N large. 

At this point we will use the fact that for probabiUty measures Pjv, P on R with Pjv converging weakly to P, 
we have (see e.g. [36]) 

liminf Pn (G) > P{G) (206) 

for any open set G. Thus, with G = (a°, oo) we see that, with probabiUty 1, for all N large 

P^((a°,oo)) = 1-F^{a°) < liminf P^"-^((a°,oo)) (207) 

j 

< 2KrN/N (208) 

Therefore, for all N large 

/(o°,oo) 



/ fix)dF^ix) < f{aN)2KrN/N ^ (209) 



as A/' — >■ 00. 

Therefore, we conclude that, / f{x)dF^{x) is bounded, and with probabiUty 1 



j f{x)dF^^{x) - j f{x)dF^{x) ^ (210) 



as — 7- oo. This concludes the proof. 

Appendix C 
Proof of Proposition 2 

The proof stems from the following result. 

Proposition 4: /(Pi , . . . , P^) is a strictly concave matrix in the Hermitian nonnegative definite matrices Pi , . . . , Pk, 
if and only if, for any couples (Pi^ , Pi J, . . . , (Pk^ , P/fb) of Hermitian nonnegative definite matrices, the function 

0(A) = / (APi„ + (1 - A)Pi„ . . . , APk„ + (1 - A)P^J (211) 

is strictly concave. 

Let us use a similar notation as in (217) of the capacity, 

/(A) = /(APi„ + (1 - A)Pi,, . . . , AP|s|„ + (1 - A)P|s| J (212) 
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and consider a set (^fe,efc,Pi, . . . ,P|s|) which satisfies the system of equations (217)-(219). Then, from remark 
(222) and (223), 

dJ 
dX 



where 



^dV_dSk dV_dek dV 
dSk dX ^ dek dX ^ dX 

fees 
dV 



V : ((5i,...,5|s|,ei,...,e|s|,A) h-j- /(A) 



Mere derivations of V lead then to 



dx^ 



II(c'e,2)^ tr (I + c^CiRiPi)-^ (R,(P,„ - P^J)^ 



(213) 
(214) 



(215) 



(216) 



Since ej > on the strictly negative real axis, if any of the R,'s is positive definite, then, for all nonnegative 
definite couples (Pi^,Pj^), such that Pj^ ^ P,^, I" < 0. Then, from Proposition 4, the deterministic approximate 
on the right-hand side of (29) is strictiy concave in Pi, ... , P|s| if any of the R, matrices is invertible. 



Appendix D 
Proof of Proposition 3 

The proof of Proposition 3 recalls the proof from [26], Proposition 2. We essentially need to show that, at point 
{5\, . . . , (5|*g|, e^, . . . , e*g|), the derivative of (29) along any is the same whether the 5'^ and the are fixed or 
vary with Q^. In other words, using the form (193) for the capacity, let us define the functions 

V°(Pi,...,P|§|) =^^logdet + CfcefcRfcPfc) 



fees 



+ -i:logdet (ijv + Xl^feTfc j 
V fees / 

K 



fe=i 



where 



= ei(Pi,...,P|s|) = -trTJa 



fees 



5i = 5i(Pi, . . . ,P|s|) = — tr RiPi (a' [I„, + Ciei(^)RiPi]) ' 



(217) 



(218) 



(219) 



and V : (Pi, . . . , P|§| , (5i, . . . , 5|s| , ei, . . . , e|s|) V°(Pi, . . . , P|§|). Then we need only prove that, for all A; e §, 



dV 

— (Pi,...,P|S|,5^...,(5fs|,e^,...,efs|) = 
dV 

— (Pi,...,P|§|,5*,...,5*§|,e^,...,e*s|) = 



(220) 
(221) 
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Remark then that 



dV 1 
— (Pi,...,P|S|,5i,...,(5|S|,ei,...,e|s|) = ;^tr 



|^(Pi,...,P|s|,(5i,...,(5|s|,ei,...,e|s|) = c/c-^rtr (I + CftCfcRiP^) ^ RfcPfe 



tT^(5fc 



(222) 



(223) 



both being null whenever, for aU fc, Cfe = efc(— tr^, Pi, . . . ,P|§|) and 6k = 6k{—cr'^, Pi, . . . , P|§|), which is true in 
particular for the unique power optimal solution P*, . . . , P*g| whenever = and Sk = S^.. 

When, for all k, Ck = e^, 5k = S'^, the maximum of V over the Pfe is then obtained by maximizing the 
expressions logdet(I„|, + Cfce^RfcPfe) over Pfe. From the inequality, (see e.g. [2]) 



det(I„fc + CfeCfeRfePfe) < Y[ (Infc + CfcefeRfePfe)ii) 



(224) 



where, only here, we denote (X)^ the entry (i, i) of matrix X. The equality is obtained if and only if +Cfcfj.RfcPfe 
is diagonal. The equality case arises for P^ and R^ — UfeDfeU^ co-diagonalizable. In this case, denoting P^ = 
UfeQfeU^, the entries of Qfe, constrained by ^ tr(Qfe) = Pfe are solutions of the classical optimization problem 
under constraint. 



sup logdet {Ink + CkelQk^k) 

Qk 

^tT{Qk)<Pk 

whose solution is given by the classical water-filhng algorithm. Hence (35). 



(225) 



Appendix E 
Proof of Proposition 1 

The convergence of the fixed-point algorithm foUows the same fine of proof as the uniqueness in Section A-E. 
Instead of proving the convergence of the algorithm at ^ = — cr^, we start by proving the convergence for z G C+. 
If one considers the difference e""*"^ — e", where e" = (e", . . . , e^), instead of e° — e°, the same development as 
in Section A-E leads to 

(226) 



e»+i _ e" = A„(e" - e"-^) 



for n > 1, where A„ is defined, similarly as in (147), as A„ = (a^), with defined by 

T 



a^j = -^trRiD„_i-iR,D„-ic,y 



.2 



(227) 



From Cauchy-Schwarz inequaUty, and the different bounds on the D„, Rfe and Tfe matrices used so far, we have 



with u = Denoting co = niax(cj), we then have that 



(228) 



(229) 
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Let < £ < 1, and take now a countable set {^i, Z2, ■ ■ ■}, Vk = ^[zk], such that Jfi^^j^^^s^ < 1 - £ for 
all Zk (this is possible by letting Vk > Ohe large enough). On this countable set, the sequences {e"} are therefore 
Cauchy sequences on C^: they all converge. Since the e" are holomorphic and bounded on every compact set 
included in C \ M"*", from Vitah's convergence theorem [15], the function e'j{z) converges on such compact sets. 
Now, for z = — a^, from the fact that we forced the initialization step to be e° = l/a^, is the Stieltjes transform 
of a distribution function at point z = —a^. It now suffices to verify that, if e" is the Stieltjes transform of a 
distribution function at point z, then so is e""*"^. This requires to verify that z G C+, e" e C+ impUes e""*"^ € C+, 
z e C+, ZBj' e C+ implies ze^~^^ e C+, and ]imy^^ —y^n{iy) < oo implies that \iva.y^^ —y^ni^v) < oo- This 
follows directly from the definition of e". From the dominated convergence theorem, we then also have that the 
limit of e" is a Stieltjes transform that is solution to (12). From the uniqueness of the Stieltjes transform, solution 
to (12) (this follows from the pointwise uniqueness on C+ and the fact that the Stieltjes transform is holomorphic 
on all compact sets of C \ M+), we then have that e" converges for all j and z G C\ M+, if e° is initialized at a 
Stieltjes transform. The choice z = —a^, e'j = follows this rule and the fixed-point algorithm converges to 
the correct solution. 

Appendix F 
Useful Lemmas 

In this section, we gather most of the known or new lemmas which are needed in various places in Proof A. 
The statements in the following Lenmia are well-known 
Lemma 1: 1) For rectangular matrices A, B of the same size, 

rank(A + B) < rank(A) -|- rank(B) (230) 

2) For rectangular matrices A, B for which AB is defined, 

rank(AB) < min(rank(A), rank(B)) (231) 

3) For rectangular A, rank(A) is less than the number of non-zero entries of A. 
Lemma 2: (Lemma 2.4 of [6]) For N x N Hermitian matrices A and B, 

lli^^'-i^^ll < ^rank(A-B) (232) 

From these two lemmas we get the following. 

Lemma 3: Let S, A, A, be Hermitian N x N, Q, Q both N x n, and B, B both Hermitian nx n. Then 
1) 

||^S+AQBQ"A _ ^s+aqbq"A|| < ^rank(Q - Q) (233) 

2) 

||^s+aqbq"a _ j,s+aqbqHA|| < ^rank(A - A) (234) 

and 
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3) 



||^s+AQBQ"A _ ^s+aqbqHA|| < ^rank(B - B) (235) 



Lemma 4: For N x N A, t £ C and r e for which A and A + rrr^ are invertible, 

r"(A + rrr")-i = — r^^A"! (236) 

1 + Tr"A~^r 

This result follows from r"A~^(A + rrr") = (1 + Tr"A~^r)r". 
Moreover, we recall Lemma 2.6 of [6] 

Lemma 5: Let z with w = A and B iV x A'' with B Hermitian, and r e C^. Then 

|tr ((B - .1.)- - (B + rrH - .1,)-) A| ^ ^'^^ " ^'^^''"^^ " ^^^^^'^ 
If < 0, we also have 



l + rH(B-^Ijv)-ir 



< M. (237) 



Itr ((B - z1n)~'^ - (B + rr" - z1n)~'^) A| < (238) 
From Lemma 2.2 of [16], and Theorems A.2, A.4, A.5 of [17], we have the following 

Lemma 6: If / is analytic on C+, both f{z) and zf{z) map C+ into C+, and there exists a ^ € (0,7r/2) for 
which zf{z) — >■ c, finite, as ^; — >■ oo restricted io {w : 6 < Bxgw < -k — 6}, then c < and / is the Stieltjes 
transform of a measure on the nonnegative reals with total mass — c. 

Also, from [6], we need 

Lemma 7: Let y = (yi, . . . , j/iv)^ with the y^'s i.i.d. such that Eyi = 0, E|yi|^ = 1 and y\ < logN, and A an 
N X N matrix independent of y, then 

E|y"Ay-trA|'' < K\\A\\^N^log^^ N (239) 

where K does not depend on N, A, nor on the distribution of j/i. 
Additionally, we need 

Lemma 8: Let D = A + iB + ivi, where A, B are A?' x A?' Hermitian, B is also positive semi-definite, and 
v>0. Then \\'D~'^\\ < v''^. 

Proof: We have DD^ = (A + jB)(A - iB) + v'^I + 2vB. Therefore the eigenvalues of DD" are greater or 
equal to v"^, which implies the singular values of D are greater or equal to v, so that the singular values of 
are less or equal to v~^. We therefore get our result. ■ 

From Theorem 2.1 of [29], 

Lemma 9: Let p(C) denote the spectral radius of the N x N matrix C (the largest of the absolute values of the 
eigenvalues of 6). If x, b e with the components of C, x, and b all positive, then the equation x = Cx + b 
implies p(C) < 1. 

From Theorem 8.1.18 of [30], 

Lemma 10: Suppose A = (ay) and B = (6^) are N x N with bij nonnegative and \aij\ < bij. Then 

p(A)<p((|ai,|))<p(B) (240) 
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Also, from Lemma 5.7.9 of [31], 

Lemma 11: Let A = (ay) and B = [bij) he N x N with aij, bij noimegative. Then 



Piiaf^bfj)) < {piA))Hp{B))^ (241) 
And, Theorems 8.2.2 and 8.3.1 of [30], 

Lemma 12: If C is a square matrix with nonnegative entries, then p(C) is an eigenvalue of C having an 
eigenvector x with nonnegative entries. Moreover, if the entries of C are all positive, then p(C) > and the entries 
of X are all positive. 

From [31], we also need Theorem 6.1.1, 

Lemma 13: Gersgorin's Theorem All the eigenvalues of an iV x TV matrix A = (o,j) he in the union of the N 
disks in the complex plane, the i^^ disk having center an and radius X^j^j 
Theorem 3.42 of [15], 

Lemma 14: Rouche's Theorem If f{z) and g{z) are analytic inside and on a closed contour C of the complex 
plane, and \g{z)\ < \f{z)\ on C, then f{z) and f{z) + g{z) have the same number of zeros inside C. 
In order to prove Theorem 2, we also need, from [33] 

Lemma 15: Consider a rectangular matrix A and let sf- denote the i*^ largest singular value of A, with sf- = 
whenever i > rank(A). Let m, n be arbitrary non-negative integers. Then for A, B rectangular of the same size 

*m+n+l — *m+l + *n+l (242) 

And for A, B rectangular for which AB is defined 

^m+n+l — *m+l*n+l (243) 

As a corollary, for any integer r > and rectangular matrices Ai, . . . , Ak, all of the same size, 

4^tr+''" < s^^i + • • • + s^+\ (244) 
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